I have tried to decode a PDF I stored as BLOB and save it into a file with .pdf extension. results[0][1] has the BLOB data extracted from database query.
blob_val=results[0][1]
if len(blob_val) % 4 != 0:
while len(blob_val) % 4 != 0:
blob_val = blob_val + b"="
decod_text = base64.b64decode(blob_val)
else:
decod_text = base64.b64decode(blob_val)
Eventhough i have added = at the end to correct padding errors, it is still showing incorrect padding error. why does it still shows this error even when we corrected it by "="?
Each base64 char is encoding six bits. For this to work, the total number of bytes should be divisible by three, not four.
This should work (and be a bit simplified):
blob_val = results[0][1]
# If the length is divisible by 3, the 'while' will never
# be entered, so no point in doing the additional 'if' above.
while len(blob_val) % 3 != 0:
blob_val += b"="
decod_text = base64.b64decode(blob_val)
Related
I am running this hashlib code and it runs almost all the way:
def generate_hashes(peaks, fan_value=DEFAULT_FAN_VALUE):
if PEAK_SORT:
sorted(peaks,key=itemgetter(1))
# bruteforce all peaks
peaks=list(peaks)
len_peaks=len(peaks)
for i in range(len_peaks):
for j in range(1, fan_value):
if (i + j) < len(peaks):
# take current & next peak frequency value
freq1 = peaks[i][IDX_FREQ_I]
freq2 = peaks[i + j][IDX_FREQ_I]
# take current & next -peak time offset
t1 = peaks[i][IDX_TIME_J]
t2 = peaks[i + j][IDX_TIME_J]
# get diff of time offsets
t_delta = t2 - t1
# check if delta is between min & max
if t_delta >= MIN_HASH_TIME_DELTA and t_delta <= MAX_HASH_TIME_DELTA:
h = hashlib.sha1(("%s|%s|%s") % (str(freq1), str(freq2), str(t_delta)))
yield (h.hexdigest()[0:FINGERPRINT_REDUCTION], t1)
However, it returns this error:
h = hashlib.sha1(("%s|%s|%s") % (str(freq1), str(freq2), str(t_delta)))
TypeError: Unicode-objects must be encoded before hashing
I am honestly completely lost and don't know how to fix it. If you guys have any follow up questions regarding details about the code I will try my best to answer. Any feedback would be appreciated.
The answer is in the error message: use encode on your text string before hashing.
h = hashlib.sha1(("%s|%s|%s" % (str(freq1), str(freq2), str(t_delta))).encode('utf-8'))
The reason this is necessary is because hashlib.sha1() requires a bytes object due to the way it works internally. Normal Python strings (since version 3.0) are made of Unicode codepoints, which don't fit into a byte. They need an encoding which defines how the translation between codepoints and bytes occurs. UTF-8 is the most popular encoding, because it can handle every Unicode codepoint yet remain backwards compatible with older encodings like ASCII.
I am trying to do a project for college which consists of sending images using two Arduino Due boards and Python. I have two codes: one for the client (the one who sends the image) and one for the server (the one who receives the image). I know how to send the bytes and check if they are correct, however, I'm required to "split" the image into packages that have:
a header that has a size of 8 bytes and must be in this order:
the first byte must say the payload size;
the next three bytes must say how many packages will be sent in total;
the next three bytes must say which package I'm currently at;
the last byte must contain a code to an error message;
a payload containing data with a maximum size of 128 bytes;
an end of package (EOP) sequence (in this case, 3 bytes).
I managed to create the end of package sequence and append it correctly to a payload in order to send, however I'm facing issues on creating the header.
I'm currently trying to make the following loop:
with open(root.filename, 'rb') as f:
picture = f.read()
picture_size = len(picture)
packages = ceil(picture_size/128)
last_pack_size = (picture_size)
EOPs = 0
EOP_bytes = [b'\x15', b'\xff', b'\xd9']
for p in range(1,packages):
read_bytes = [None, int.to_bytes(picture[(p-1)*128], 1, 'big'),
int.to_bytes(picture[(p-1)*128 + 1], 1, 'big')]
if p != packages:
endrange = p*128+1
else:
endrange = picture_size
for i in range((p-1)*128 + 2, endrange):
read_bytes.append(int.to_bytes(picture[i], 1, 'big'))
read_bytes.pop(0)
if read_bytes == EOP_bytes:
EOPs += 1
print("read_bytes:", read_bytes)
print("EOP_bytes:", EOP_bytes)
print("EOPs", EOPs)
I expect at the end that the server receives the same amount of packages that the client has sent, and in the end I need to join the packages to recreate the image. I can manage to do that, I just need some help with creating the header.
Here is a a demo of how to construct your header, it's not a complete soultion but given you only asked for help constructing the header it may be what you are looking for.
headerArray = bytearray()
def Main():
global headerArray
# Sample Data
payloadSize = 254 # 0 - 254
totalPackages = 1
currentPackage = 1
errorCode = 101 # 0 - 254
AddToByteArray(payloadSize,1) # the first byte must say the payload size;
AddToByteArray(totalPackages,3) # the next three bytes must say how many packages will be sent in total;
AddToByteArray(currentPackage,3) # the next three bytes must say which package I'm currently at;
AddToByteArray(errorCode,1) # the last byte must contain a code to an error message;
def AddToByteArray(value,numberOfBytes):
global headerArray
allocate = value.to_bytes(numberOfBytes, 'little')
headerArray += allocate
Main()
# Output
print(f"Byte Array: {headerArray}")
for i in range(0,len(headerArray)):
print(f"Byte Position: {i} Value:{headerArray[i]}")
Obviously I have not included the logic to obtain the current package or total packages.
I am new to python and we had been trying to use lzw code from GIT in the program.
https://github.com/joeatwork/python-lzw/blob/master/lzw/init.py
This is working well if we have a smaller blob but if the blob size increases it doesn't decompress the blob. So I had been reading the documentation but I am unable to understand the below which might be the reason why the full blob is not getting decompressed.
I have also attached a strip of the python code I am using.
Our control codes are
- CLEAR_CODE (codepoint 256). When this code is encountered, we flush
the codebook and start over.
- END_OF_INFO_CODE (codepoint 257). This code is reserved for
encoder/decoders over the integer codepoint stream (like the
mechanical bit that unpacks bits into codepoints)
When dealing with bytes, codes are emitted as variable
length bit strings packed into the stream of bytes.
codepoints are written with varying length
- initially 9 bits
- at 512 entries 10 bits
- at 1025 entries at 11 bits
- at 2048 entries 12 bits
- with max of 4095 entries in a table (including Clear and EOI)
code points are stored with their MSB in the most significant bit
available in the output character.
My code strip :
def decompress_without_eoi(buf):
# Decompress LZW into a bytes, ignoring End of Information code
def gen():
try:
for byte in lzw.decompress(buf):
yield byte
except ValueError as exc:
#print(repr(exc))
if 'End of information code' in repr(exc):
#print('Ignoring EOI error..\n')
pass
else:
raise
return
try:
#print('Trying a join..\n')
deblob = b''.join(gen())
except Exception as exc2:
#print(repr(exc2))
#print('Trying byte by byte..')
deblob=[]
try:
for byte in gen():
deblob.append(byte)
except Exception as exc3:
#print(repr(exc3))
return b''.join(deblob)
return deblob
#current function to deblob
def deblob3(row):
if pd.notnull(row[0]):
blob = row[0]
h = html2text.HTML2Text()
h.ignore_links=True
h.ignore_images = True #zzzz
if type(blob) != bytes:
blobbytes = blob.read()[:-10]
else:
blobbytes = blob[:-10]
if row[1]==361:
# If compressed, return up to EOI-257 code, which is last non-null code before tag
# print (row[0])
return h.handle(striprtf(decompress_without_eoi(blobbytes)))
elif row[1]==360:
# If uncompressed, return up to tag
return h.handle(striprtf(blobbytes))
This function has been called as per below
nf['IS_BLOB'] = nf[['IS_BLOB','COMPRESSION']].apply(deblob3,axis=1)
I am writing a Python script to hide data in an image. It basically hides the bits in the last two bits of the red color in RGB map of each pixel in a .PNG. The script works fine for lowercase letters but produces an error with a full stop. It produces this error:
Traceback (most recent call last): File
"E:\Python\Steganography\main.py", line 65, in
print(unhide('coded-img.png')) File "E:\Python\Steganography\main.py", line 60, in unhide
message = bin2str(binary) File "E:\Python\Steganography\main.py", line 16, in bin2str
return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position
6: invalid start byte
Here, is my code:
from PIL import Image
def str2bin(message):
binary = bin(int.from_bytes(message.encode('utf-8'), 'big'))
return binary[2:]
def bin2str(binary):
n = int(binary, 2)
return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
def hide(filename, message):
image = Image.open(filename)
binary = str2bin(message) + '00000000'
data = list(image.getdata())
newData = []
index = 0
for pixel in data:
if index < len(binary):
pixel = list(pixel)
pixel[0] >>= 2
pixel[0] <<= 2
pixel[0] += int('0b' + binary[index:index+2], 2)
pixel = tuple(pixel)
index += 2
newData.append(pixel)
print(binary)
image.putdata(newData)
image.save('coded-'+filename, 'PNG')
def unhide(filename):
image = Image.open(filename)
data = image.getdata()
binary = '0'
index = 0
while binary[-8:] != '00000000':
binary += bin(data[index][0])[-2:]
index += 1
binary = binary[:-1]
print(binary)
print(index*2)
message = bin2str(binary)
return message
hide('img.png', 'alpha.')
print(unhide('coded-img.png'))
Please help. Thanks!
There are at least two problems with your code.
The first problem is that your encoding can be misaligned by 1 bit since leading null bits are not included in the output of the bin() function:
>>> bin(int.from_bytes('a'.encode('utf-8'), 'big'))[2:]
'1100001'
# This string is of odd length and (when joined with the terminating '00000000')
# turns into a still odd-length '110000100000000' which is then handled by your
# code as if there was an extra trailing zero (so that the length is even).
# During decoding you compensate for that effect with the
#
# binary = binary[:-1]
#
# line. The latter is responsible for your stated problem when the binary
# representation of your string is in fact of even length and doesn't need
# the extra bit as in the below example:
>>> bin(int.from_bytes('.'.encode('utf-8'), 'big'))[2:]
'101110'
You better complement your binary string to even length by prepending an extra null bit (if needed).
The other problem is that while restoring the hidden message the stopping condition binary[-8:] == '00000000' can be prematurely satisfied through leading bits of one (partially restored) symbol being concatenated to trailing bits of another symbol. This can happen, for example, in the following cases
the symbol # (with ASCII code=64, i.e. 6 low order bits unset) followed by any character having an ASCII code value less than 64 (i.e. with 2 highest order bits unset);
a space character (with ASCII code=32, i.e. 4 low order bits unset) followed by a linefeed/newline character(with ASCII code=10, i.e. 4 high order bits unset).
You can fix that bug by requiring that a full byte is decoded at the time when the last 8 bits appear to all be unset:
while not (len(binary) % 8 == 0 and binary[-8:] == '00000000'):
# ...
In my Django application I have hierarchical URL structure:
webpage.com/property/PK/sub-property/PK/ etc...
I do not want to expose primary keys and create a vulnerability. Therefore I am
encrypting all PKs into strings in all templates and URLs. This is done by the wonderful library django-encrypted-id written by this SO user.
However, the library supports up to 2^64 long integers and produces 24 characters output (22 + 2 padding). This results in huge URLs in my nested structure.
Therefore, I would like to patch the encrypting and decrypting functions and try to shorten the output. Here is the original code (+ padding handling which I added):
# Remove the padding after encode and add it on decode
PADDING = '=='
def encode(the_id):
assert 0 <= the_id < 2 ** 64
crc = binascii.crc32(bytes(the_id)) & 0xffffff
message = struct.pack(b"<IQxxxx", crc, the_id)
assert len(message) == 16
cypher = AES.new(
settings.SECRET_KEY[:24], AES.MODE_CBC,
settings.SECRET_KEY[-16:]
)
return base64.urlsafe_b64encode(cypher.encrypt(message)).rstrip(PADDING)
def decode(e):
if isinstance(e, basestring):
e = bytes(e.encode("ascii"))
try:
e += str(PADDING)
e = base64.urlsafe_b64decode(e)
except (TypeError, AttributeError):
raise ValueError("Failed to decrypt, invalid input.")
for skey in getattr(settings, "SECRET_KEYS", [settings.SECRET_KEY]):
cypher = AES.new(skey[:24], AES.MODE_CBC, skey[-16:])
msg = cypher.decrypt(e)
crc, the_id = struct.unpack("<IQxxxx", msg)
if crc != binascii.crc32(bytes(the_id)) & 0xffffff:
continue
return the_id
raise ValueError("Failed to decrypt, CRC never matched.")
# Lets test with big numbers
for x in range(100000000, 100000003):
ekey = encode(x)
pk = decode(ekey)
print "Pk: %s Ekey: %s" % (pk, ekey)
Output (I changed the strings a bit, so don't try to hack me :P):
Pk: 100000000 Ekey: GNtOHji8rA42qfq3p5gNMI
Pk: 100000001 Ekey: tK6RcAZ2MrWmR3nB5qkQDe
Pk: 100000002 Ekey: a7VXIf8pEB6R7XvqwGQo6W
I have tried to modify everything in the encode() function but without any success. The produced string has always the length of 22.
Here is what I want:
Keep the encryption strength near to the original level or at least do not decrease it dramatically
Support integers up to 2^48 (~281 trillions), or 2^40, because as it is now with 2^64 is too much, I do not think that we will ever have such huge PKs in the database.
I will be happy with string length between 14-20. If its 20.. then yeah, its still 2 chars less..
Currently you are using CBC mode with a static IV, so the code you have isn't secure anyway and, like you say, produces rather large ciphertexts.
I would recommend swapping from CBC mode to CTR mode, which lets you have a variable length IV. The normal recommended length for the IV (or nonce) in CTR mode, I think, is 12, but you can reduce this up or down as needed. CTR is also a stream cipher which means what you put in is what you get out in terms of size. With AES, CBC mode will always return you ciphertexts in blocks of 16 bytes so even if you are encrypting 6 bytes, you get 16 bytes out, so isn't ideal for you.
If you make your IV say... 48 bits long and aim to encrypt no larger than 48 bits, you'll be able to produce a raw output of 6 + 6 = 12 bytes, or with base64, (4*(12/3)) = 16 bytes. You will be able to get a lower output than this by further reducing your IV and/or input size (2^40?). You can lower possible values of your input as much as you want without damaging the security.
Keep in mind that CTR does have pitfalls. Producing two ciphertexts that share the same IV and key means that they can be trivially broken, so always randomly generate your IV (and don't reduce it in size too much).