"Split" a image into packages of bytes - python

I am trying to do a project for college which consists of sending images using two Arduino Due boards and Python. I have two codes: one for the client (the one who sends the image) and one for the server (the one who receives the image). I know how to send the bytes and check if they are correct, however, I'm required to "split" the image into packages that have:
a header that has a size of 8 bytes and must be in this order:
the first byte must say the payload size;
the next three bytes must say how many packages will be sent in total;
the next three bytes must say which package I'm currently at;
the last byte must contain a code to an error message;
a payload containing data with a maximum size of 128 bytes;
an end of package (EOP) sequence (in this case, 3 bytes).
I managed to create the end of package sequence and append it correctly to a payload in order to send, however I'm facing issues on creating the header.
I'm currently trying to make the following loop:
with open(root.filename, 'rb') as f:
picture = f.read()
picture_size = len(picture)
packages = ceil(picture_size/128)
last_pack_size = (picture_size)
EOPs = 0
EOP_bytes = [b'\x15', b'\xff', b'\xd9']
for p in range(1,packages):
read_bytes = [None, int.to_bytes(picture[(p-1)*128], 1, 'big'),
int.to_bytes(picture[(p-1)*128 + 1], 1, 'big')]
if p != packages:
endrange = p*128+1
else:
endrange = picture_size
for i in range((p-1)*128 + 2, endrange):
read_bytes.append(int.to_bytes(picture[i], 1, 'big'))
read_bytes.pop(0)
if read_bytes == EOP_bytes:
EOPs += 1
print("read_bytes:", read_bytes)
print("EOP_bytes:", EOP_bytes)
print("EOPs", EOPs)
I expect at the end that the server receives the same amount of packages that the client has sent, and in the end I need to join the packages to recreate the image. I can manage to do that, I just need some help with creating the header.

Here is a a demo of how to construct your header, it's not a complete soultion but given you only asked for help constructing the header it may be what you are looking for.
headerArray = bytearray()
def Main():
global headerArray
# Sample Data
payloadSize = 254 # 0 - 254
totalPackages = 1
currentPackage = 1
errorCode = 101 # 0 - 254
AddToByteArray(payloadSize,1) # the first byte must say the payload size;
AddToByteArray(totalPackages,3) # the next three bytes must say how many packages will be sent in total;
AddToByteArray(currentPackage,3) # the next three bytes must say which package I'm currently at;
AddToByteArray(errorCode,1) # the last byte must contain a code to an error message;
def AddToByteArray(value,numberOfBytes):
global headerArray
allocate = value.to_bytes(numberOfBytes, 'little')
headerArray += allocate
Main()
# Output
print(f"Byte Array: {headerArray}")
for i in range(0,len(headerArray)):
print(f"Byte Position: {i} Value:{headerArray[i]}")
Obviously I have not included the logic to obtain the current package or total packages.

Related

Generating a UDP message in python with a header and payload in python3

I am new to Networking and trying to implement a network calculator using python3 where the client's responsibility is to send operands and operators and the server will calculate the result and send it back to the client. Communication is through UDP messages and I am working on client side. Each message is comprised of a header and a payload and they are described as shown in the below figures.
UDP header:
I am familiar with sending string messages using sockets but having a hard-time with how to make a message with both header and payload and how to assign the bits for various attributes or how to generate message/client id's in the header and If there is any way to automatically generate the Id's. Any help or suggestions will be highly appreciated.
Thanks in advance
I will only do a portion of your homework.
I hope it will help you to find energy to work on missing parts.
import struct
import socket
CPROTO_ECODE_REQUEST, CPROTO_ECODE_SUCCESS, CPROTO_ECODE_FAIL = (0,1,2)
ver = 1 # version of protocol
mid = 0 # initial value
cid = 99 # client Id (arbitrary)
sock = socket.socket( ...) # to be customized
def sendRecv( num1, op, num2):
global mid
ocs = ("+", "-", "*", "/").index( op)
byte0 = ver + (ocs << 3) + (CPROTO_ECODE_REQUEST << 6)
hdr = struct.pack( "!BBH", byte0, mid, cid)
parts1 = (b'0000' + num1.encode() + b'0000').split(b'.')
parts2 = (b'0000' + num2.encode() + b'0000').split(b'.')
msg = hdr + parts1[0][-4:] + parts1[1][:4] + parts2[0][-4:] + parts2[1][:4]
socket.send( msg) # send request
bufr = socket.recv( 512) # get answer
# to do:
# complete socket_send and socket.recv
# unpack bufr into: verr,ecr,opr,value_i, value_f
# verify that verr, ecr, opr, are appropriate
# combine value_i and value_f into answer
mid += 1
return answer
result = sendRecv( '2.47', '+', '46.234')
There are many elements that haven't be specified by your teacher:
what should be the byte-ordering on the network (bigEndian or littleEndian)? The above example suppose it's bigEndian but you can easily modify the 'pack' statement to use littleEndian.
What should the program do if the received packet header is invalid?
What should the program do if there's no answer from server?
Payload: how should we interpret "4 most significant digits of fraction"? Does that mean that the value is in ASCII? That's not specified.
Payload: assuming the fraction is in ASCII, should it be right-justified or left-justified in the packet?
Payload: same question for integer portion.
Payload: if the values are in binary, are they signed or unsigned. It will have an affect on the unpacking statement.
In the program above, I assumed that:
values are positive and in ASCII (without sign)
integer portion is right-justified
fractional portion is left justified
Have fun!

Zeroing/blacking out pixels in a .tiff-like file (.svs or .ndpi)

I am trying to zero out the pixels in some .tiff-like biomedical scans (.svs & .ndpi) by changing values directly in the binary file.
For reference, I am using the docs on the .tiff format here.
As a sanity check, I've confirmed that the first two bytes have values 73 and 73 (or I and I in ASCII), meaning it is little-endian, and that the two next bytes are the value 42 (both these things are expected as according to the docs just mentioned).
I wrote a Python script that reads the IFD (Image File Directory) and its components, but I am having troubles proceeding from there.
My code is this:
with open('scan.svs', "rb") as f:
# Read away the first 4 bytes:
f.read(4)
# Read offset of first IFD as the four next bytes:
IFD_offset = int.from_bytes(f.read(4), 'little')
# Move to IFD:
f.seek(IFD_offset, 0)
# Read IFD:
IFD = f.read(12)
# Get components of IFD:
tag = int.from_bytes(IFD[:2], 'little')
field_type = int.from_bytes(IFD[2:4], 'little')
count = int.from_bytes(IFD[4:8], 'little')
value_offset = int.from_bytes(IFD[8:], 'little')
# Now what?
The values for the components are tag=16, field_type=254, count=65540 and value_offset=0.
How do I go from there?
Ps: Using Python is not a must, if there is some other tool that could more easily to the job.

How to fix this IO bound python operation on 12GB .bin file?

I'm reading this book Hands-On Machine Learning for Algorithmic Trading and I came across a script that is supposed to parse a large .bin binary file and convert it to .h5. This file consists of something called ITCH data, you can find the technical documentation of the data here. The script is very inefficient, it reads a 12GB(12952050754 bytes) file 2 bytes at a time which is ultra slow(might take up to 4 hours on some decent 4cpu GCP instance) which is not very surprising. You can find the whole notebook here.
My problem is I don't understand how this .bin file is being read I mean I don't see where is the necessity of reading the file 2 bytes at a time, I think there is a way to read at a large buffer size but I'm not sure how to do it, or even convert the script to c++ if after optimizing this script, it is still being slow which I can do if I understand the inner workings of this I/O process, does anyone have suggestions?
here's a link to the file source of ITCH data, you can find small files(300 mb or less) which are for less time periods if you need to experiment with the code.
The bottleneck:
with file_name.open('rb') as data:
while True:
# determine message size in bytes
message_size = int.from_bytes(data.read(2), byteorder='big', signed=False)
# get message type by reading first byte
message_type = data.read(1).decode('ascii')
message_type_counter.update([message_type])
# read & store message
record = data.read(message_size - 1)
message = message_fields[message_type]._make(unpack(fstring[message_type], record))
messages[message_type].append(message)
# deal with system events
if message_type == 'S':
seconds = int.from_bytes(message.timestamp, byteorder='big') * 1e-9
print('\n', event_codes.get(message.event_code.decode('ascii'), 'Error'))
print(f'\t{format_time(seconds)}\t{message_count:12,.0f}')
if message.event_code.decode('ascii') == 'C':
store_messages(messages)
break
message_count += 1
if message_count % 2.5e7 == 0:
seconds = int.from_bytes(message.timestamp, byteorder='big') * 1e-9
d = format_time(time() - start)
print(f'\t{format_time(seconds)}\t{message_count:12,.0f}\t{d}')
res = store_messages(messages)
if res == 1:
print(pd.Series(dict(message_type_counter)).sort_values())
break
messages.clear()
And here's the store_messages() function:
def store_messages(m):
"""Handle occasional storing of all messages"""
with pd.HDFStore(itch_store) as store:
for mtype, data in m.items():
# convert to DataFrame
data = pd.DataFrame(data)
# parse timestamp info
data.timestamp = data.timestamp.apply(int.from_bytes, byteorder='big')
data.timestamp = pd.to_timedelta(data.timestamp)
# apply alpha formatting
if mtype in alpha_formats.keys():
data = format_alpha(mtype, data)
s = alpha_length.get(mtype)
if s:
s = {c: s.get(c) for c in data.columns}
dc = ['stock_locate']
if m == 'R':
dc.append('stock')
try:
store.append(mtype,
data,
format='t',
min_itemsize=s,
data_columns=dc)
except Exception as e:
print(e)
print(mtype)
print(data.info())
print(pd.Series(list(m.keys())).value_counts())
data.to_csv('data.csv', index=False)
return 1
return 0
According to the code, file format looks like its 2 bytes of message size, one byte of message type and then n bytes of actual message (defined by the previously read message size).
Low hanging fruit to optimize this is to read 3 bytes first into list, convert [0:1] to message size int and [2] to message type and then read the message ..
To further eliminate amount of required reads, you could read a fixed amount of data from the file into a list of and start extracting from it. While extracting, keep a index of already processed bytes stored and once that index or index+amount of data to be read goes over the size of the list, you prepopulate the list. This could lead to huge memory requirements if not done properly thought..

python-lzw doesn't decompress larger blobs

I am new to python and we had been trying to use lzw code from GIT in the program.
https://github.com/joeatwork/python-lzw/blob/master/lzw/init.py
This is working well if we have a smaller blob but if the blob size increases it doesn't decompress the blob. So I had been reading the documentation but I am unable to understand the below which might be the reason why the full blob is not getting decompressed.
I have also attached a strip of the python code I am using.
Our control codes are
- CLEAR_CODE (codepoint 256). When this code is encountered, we flush
the codebook and start over.
- END_OF_INFO_CODE (codepoint 257). This code is reserved for
encoder/decoders over the integer codepoint stream (like the
mechanical bit that unpacks bits into codepoints)
When dealing with bytes, codes are emitted as variable
length bit strings packed into the stream of bytes.
codepoints are written with varying length
- initially 9 bits
- at 512 entries 10 bits
- at 1025 entries at 11 bits
- at 2048 entries 12 bits
- with max of 4095 entries in a table (including Clear and EOI)
code points are stored with their MSB in the most significant bit
available in the output character.
My code strip :
def decompress_without_eoi(buf):
# Decompress LZW into a bytes, ignoring End of Information code
def gen():
try:
for byte in lzw.decompress(buf):
yield byte
except ValueError as exc:
#print(repr(exc))
if 'End of information code' in repr(exc):
#print('Ignoring EOI error..\n')
pass
else:
raise
return
try:
#print('Trying a join..\n')
deblob = b''.join(gen())
except Exception as exc2:
#print(repr(exc2))
#print('Trying byte by byte..')
deblob=[]
try:
for byte in gen():
deblob.append(byte)
except Exception as exc3:
#print(repr(exc3))
return b''.join(deblob)
return deblob
#current function to deblob
def deblob3(row):
if pd.notnull(row[0]):
blob = row[0]
h = html2text.HTML2Text()
h.ignore_links=True
h.ignore_images = True #zzzz
if type(blob) != bytes:
blobbytes = blob.read()[:-10]
else:
blobbytes = blob[:-10]
if row[1]==361:
# If compressed, return up to EOI-257 code, which is last non-null code before tag
# print (row[0])
return h.handle(striprtf(decompress_without_eoi(blobbytes)))
elif row[1]==360:
# If uncompressed, return up to tag
return h.handle(striprtf(blobbytes))
This function has been called as per below
nf['IS_BLOB'] = nf[['IS_BLOB','COMPRESSION']].apply(deblob3,axis=1)

How can I moderate how much data I get from a file stream in python?

I've got an embedded system I'm writing a user app against. The user app needs to take a firmware image and split it into chunks suitable for sending to the embedded system for programming. I'm starting with S-record files, and using Xmodem for file transfer (meaning each major 'file' transfer would need to be ended with an EOF), so the easiest thing for me to do would be to split the image file into a set files of full s-records no greater than the size of the receive buffer of the (single threaded) embedded system. My user app is written in python, and I have a C program that will split the firmware image into properly sized files, but I thought there may be a more 'pythonic' way of going about this, perhaps by using a custom stream handler.
Any thoughts?
Edit : to add to the discussion, I can feed my input file into a buffer. How could I use range to set a hard limit going into the buffer of either the file size, or a full S-record line ('S' delimited ASCII text)?
I thought this was an interesting question and the S-record format isn't too complicated, so I wrote an S-record encoder that appears to work from my limited testing.
import struct
def s_record_encode(fileobj, recordtype, address, buflen):
"""S-Record encode bytes from file.
fileobj file-like object to read data (if any)
recordtype 'S0' to 'S9'
address integer address
buflen maximum output buffer size
"""
# S-type to (address_len, has_data)
record_address_bytes = {
'S0':(2, True), 'S1':(2, True), 'S2':(3, True), 'S3':(4, True),
'S5':(2, False), 'S7':(4, False), 'S8':(3, False), 'S9':(2, False)
}
# params for this record type
address_len, has_data = record_address_bytes[recordtype]
# big-endian address as string, trimmed to length
address = struct.pack('>L', address)[-address_len:]
# read data up to 255 bytes minus address and checksum len
if has_data:
data = fileobj.read(0xff - len(address) - 1)
if not data:
return '', 0
else:
data = ''
# byte count is address + data + checksum
count = len(address) + len(data) + 1
count = struct.pack('B', count)
# checksum count + address + data
checksummed_record = count + address + data
checksum = struct.pack('B', sum(ord(d) for d in checksummed_record) & 0xff ^ 0xff)
# glue record type to hex encoded buffer
record = recordtype + (checksummed_record + checksum).encode('hex').upper()
# return buffer and how much data we read from the file
return record, len(data)
def s_record_test():
from cStringIO import StringIO
# from an example, this should encode to given string
fake_file = StringIO("\x0A\x0A\x0D\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")
encode_to = "S1137AF00A0A0D0000000000000000000000000061"
fake_file.seek(0)
record, buflen = s_record_encode(fake_file, 'S1', 0x7af0, 80)
print 'record', record
print 'encode_to', encode_to
assert record == encode_to
fake_file = StringIO()
for i in xrange(1000):
fake_file.write(struct.pack('>L', i))
fake_file.seek(0)
address = 0
while True:
buf, datalen = s_record_encode(fake_file, 'S2', address, 100)
if not buf:
break
print address, datalen, buf
address += datalen
If you already have a C program then you're in luck. Python is like a scripting language over C with most of the same functions. See Core tools for working with streams for all the familiar C I/O functions. Then you can make your program more Pythonic by rolling methods into Classes and using things like The Python Slice Notation.

Categories

Resources