Converting binary data to hex in python properly - python

I'm working on a program that uses a BMP and a separate file for the transparency layer. I need to convert them into a PNG from that so I'm using PIL in python to do so. However, I need the data from the transparency file in hex so it can be added to the image. I am using the binascii.hexlify function to do that.
Now, the problem I'm having is for some reason the data, after going through the hexlify function (I've systematically narrowed it down by going through my code piece by piece), looks different than it does in my hex editor and is causing slight distortions in the images. I can't seem to figure out where I am going wrong.
Data before processing in Hex editor
Data after processing in Hex editor
Here is the problematic part off my code:
filename = askopenfilename(parent=root)
with open(filename, 'rb') as f:
content = f.read()
f.close()
hexContent = binascii.hexlify(content).decode("utf-8")
My input
My output (This is hexcontent written to a file. Since I know that it is not going wrong in the writing of the file, and it is also irrelevant to my actual program I did not add that part to the code snippet)
Before anyone asks I tried codecs.encode(content, 'hex') and binascii.b2a_hex(content).
As for how I know that it is this part that is messing up, I printed out binascii.hexlify(content) and found the same part as in the hex editor and it looked identical to what I had got in the end.
Another possibility for where it is going wrong is in the "open(filename, 'rb')" step. I haven't yet thought of a way to test that. So any help or suggestions would be appreciated. If you need one of the files I'm using for testing purposes, I'll gladly add one here.

If I understand your question correctly then your desired output should match Data before processing in Hex editor. I can obtain this with the following code:
with open('Input.alp', 'rb') as f:
i = 0
for i, chunk in enumerate(iter(lambda: f.read(16), b'')):
if 688 <= i * 16 <= 736:
print i * 16, chunk.encode('hex')
Outputs:
688 ffffffffffffffffffffffffffffffff
704 ffffffffffffffffffffffe000000000
720 000000000000000001dfffffffffffff
736 ffffffffffffffffffffffffffffffff
See this answer for a more detailed explanation.

Related

Edit a few lines of uncompressed PDF in Python

I want to edit a few lines in an uncompressed pdf.
I found a similar problem but since I need to scan the file a few times to get the exact line positions I want to change this doesn't really suit (and the pure number of RegEx matches are more than desired).
The pdf contains utf-8 encodable lines (a few of them I want to edit, bookmark target ids in particular)
and a lot of blobs (guess images and so on).
When I edit the file with notepad it's working fine, but when I do it programatically (reading in, changing a few lines, writing back)
images and some formatting is missing. (Sine they are not read in at the firstplace, ignore-option)
with codecs.open("merged-uncompressed.pdf", "r", encoding='ascii', errors='ignore') as f:
I can read the file in with errors="surrogateescape" and wanted to map the lines from above import but don't know if this approach can work.
Does anyone know a way how to deal with this?
Best, Lukas
I was able to solve this:
read the file as binary
marked the lines which couldn't be encoded utf-8
copied the list line by line to a temporary list ( not encodable lines were copied with a placholder 'None\n')
Then I went back to do the searching part on the copied list so I got my lines I wanted to replace
replaced the lines in the original binary list (same indices!)
wrote it back to file
the resulting pdf was a bit corupted because of whitespace before the target ids of the bookmarks but by recompressing qpdf fixed it:)
The code is very messy at the moment and so I don't want to publish it right now.
But I want to add it at github within the next few weeks.
If anyone needs it: just comment and it will have more priority.
Thanks to anyone who wanted to help:)
Lukas

Receiving Data From Universal Robot and Decoding

I am working on a project where we are looking to get some data from a universal robot such as position and force data and then store that data in a text file for later reference. We can receive the data just fine, but turning it into readable coordinates is an issue. An example data string is below:
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\xbf\x00\x00\x80\xbf\x00\x00\x80\xbf\x00\x00\x80\xbf\x00\x00\x80\xbf\x00\x00\x80\xbf\x00\x00\xc0?\x00\x00\x16C\x00\x00\xc0?\x00\x00\x16C\x00\x00\x00?\xcd\xcc\xcc>\x00\x00\x96C\x00\x00\xc8A\x1e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x88\xfb\x7f?\xd0M><\xc0G\x9e:tNT?\r\x11\x07\xbc\xb9\xfd\x7f?~\xa0\xa1:\x03\x02+?\x16\xeb\x7f\xbf#\xce\xcc\xbc9\xdfl\xbbq\xc3\x8a>i\x19T<\xf3\xf9\x7f\xbf\xb4k\x87\xbb->\xc2>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80?\xdb\x0f\xc9#\xa7\xdcU#\xa7\xdcU#\xa7\xdcU#\xa7\xdcU#\xa7\xdcU#\xa7\xdcU#\xfe\xff\xff\xff\xfe\xff\xff\xff\xfe\xff\xff\xff\xfe\xff\xff\xff\xfe\xff\xff\xff\xff\xff\xff\xff\xecb\xc7#\xecb\xc7#\xecb\xc7#\
*not entire string received
At first I thought it was hex so I tried the code:
packet_12 = packet_12.encode('hex')
x = str(packet_12)
x = struct.unpack('!d', packet_12.decode('hex'))[0]
all_data.write("X=", x * 1000)
But to no avail. I tried several different decoding methods using codecs and .encode, but none worked. I found on a different post here the two code blocks below:
y = codecs.decode(packet_12, 'utf-8', errors='ignore')
packet_12 = s.recv(8)
z = str(packet_12)
x = ''.join('%02x' % ord(c) for c in packet_12)
Neither worked for my application. Finally I tried saving the entire sting in a .txt file and opening it with python and decoding it with the code below, but again nothing seemed to happen.
with io.open('C:\\Users\\myuser\\Desktop\\decode.txt', 'r', encoding='utf8') as f:
text = f.read()
with io.open('C:\\Users\\myuser\\Desktop\\decode', 'w', encoding='utf8') as f:
f.write(text)
I am aware I might be missing something incredibly simple such as using the wrong decoding type or I might even have jibberish as the robot output, but any help is appreciated.
The easiest way to receive data from the robot with python is to use Universal Robots' Real-Time-Data-Exchange Interface. They offer some python examples for receiving and sending data.
Check out my GitHub repo for an example code which is based on the official code from UR:
https://github.com/jonenfabian/Read_Data_From_Universal_Robots

I read image file as binary strange symbols appear

I read an image file in binary in python:
open('chall.png, 'rb').read()
Result:
b'\xe0>8.~cxfein{ ;-0lek\xf7virejneinv\xe7I\x01blo7\x14"1\x07;\x03\x1bE\x19\x1c\x19\x0f\x1a\x05\x07L\x11\x10\x1e\x13I\x16\x11\x0b\nei\x16\xac\x84\xeb2\xf4O\xdcd*\x89\x1af7`e\xf7i\xd7j\xd7\x03\xe7\x15\x8c\x80\x92,$>L\x0f\xa4\xf2\x94\x98\xe9IE\x06#7\xb5\xfc |g\xe1{\xbf\x11\x93\x94\x1e\x11\x88\xaf8\x13\xcb#\x08\xbf\x1b\xdeO-\x1c\xb6M\xf6FS\xcb6\x9c\n,\x99\x90\x90\x14\xfb\xf8\x97\x1a\x94\xcb\x
(the binary code of the file is larger than this)
Wait what ? Binary is a lot of 1's and 0's. Okay, perhaps this is hexadecimal (a format that makes binary more readable for humans) ?
Nope, this is certainly not hexadecimals either! What is going on ?
What am I dealing with here ?
How can I convert it into hexadecimals or something more readable than this ? (as you might guess I am quite new to this. Please be nice.)
EDIT:
file = open('image.png', 'rb').read()
file[0]
#output: 224
file[1]
#output: 62
How come the output of the first "character" (the first index) be 224? Shouldn't it be \xe0?
When you read a binary data and try to print it, the binary data is tried and decoded into utf-8 by default. Thats why you see strange characters. The below code formats as hex before printing. You should see hexadecimals data with out any stage characters using the below code.
for i in open('image.png', 'rb').read():
print(r'{0:#x}'.format(i), end=' ')

Can't convert base64 images from csv to file in Python

I know it looks like it has been answered before, but I can't seem to find a solution for this issue. I have a CSV file that contains very long strings of Base64 encoded images (~5mb each). I enabled the CSV field size limit to max. There are several of the images decoded separated by columns then a few values that are only a couple of words long. I can read these through print(row[7]) for example no problem. The images won't print the base64 strings, and i'm trying to decode them and save them to the filesystem but they end up being empty. Any thoughts?
fh = open("~path~/image.png", "wb")
x = base64.b64decode(row[1])
fh.write(x)
fh.close()
Thanks for any help!
EDIT: Works now. CSV split on python seems to act a little different than in Java. The empty values came up due to the csv being saved differently than the exporting tool I used indicated, so it was left with values ("8",,"data:image/png;base64,IR0BRR....",...). I didn't catch the empty space before, which is why it was showing blank, and then I also attempted to append the data:image/png part to the beginning of itself since I believed that python string split would split the comma after base64 like Java would. After adjusting for this, the image correctly saves in my filesystem.

how to get tell() to work

I'm trying to open a file and read from the last point read. My files are rather big (20 Mb to ~ 1 Gb) After doing some research it seems that tell() and seek() would be one of the most efficient ways to perform this. I've tried the following code
opened = open(filename, "rU")
f1 = csv.reader(opened)
k = []
for line in f1:
k.append(opened.tell())
When I do this every value in the list is 8272 Long. Does that mean that I cannot use this implementation? Is there something I'm missing? Thanks for your help!
I'm running python 2.7 in Windows 7
Update
After piecing together everything learned here and trial and error I get the following code
opened = open(filename, "rU")
k = [0]
where = 1
for switch in opened:
where += len(switch) + 1
f = StringIO.StringIO(switch)
interesting = csv.reader(f, delimiter=',')
good_values = interesting.next()
k.append(where)
return k
This allows the user to know exactly where in the file to go to while still being able to parse it according to its format. I'm not completely sure of why the offsets need to be constantly added (It seems that the newline is not accurately accounted for in len()).
It looks like the csv.reader is reading the file in chunks of 8272 bytes, that's why you see this number returned from opened.tell() many times - until, I guess, you have read all the lines from your file in the range of 0-8272. After that you will see 8272*2 a few times, exact number will depend on the length of the lines in the buffer read.
So, basically, in your program, tell() doesn't give you offsets of new CSV lines, as you seem to assume. It's only telling you about offset of the end of the file's region currently read into an internal OS buffer used by system functions used to implement the Python's IO functions.

Categories

Resources