I read image file as binary strange symbols appear - python

I read an image file in binary in python:
open('chall.png, 'rb').read()
Result:
b'\xe0>8.~cxfein{ ;-0lek\xf7virejneinv\xe7I\x01blo7\x14"1\x07;\x03\x1bE\x19\x1c\x19\x0f\x1a\x05\x07L\x11\x10\x1e\x13I\x16\x11\x0b\nei\x16\xac\x84\xeb2\xf4O\xdcd*\x89\x1af7`e\xf7i\xd7j\xd7\x03\xe7\x15\x8c\x80\x92,$>L\x0f\xa4\xf2\x94\x98\xe9IE\x06#7\xb5\xfc |g\xe1{\xbf\x11\x93\x94\x1e\x11\x88\xaf8\x13\xcb#\x08\xbf\x1b\xdeO-\x1c\xb6M\xf6FS\xcb6\x9c\n,\x99\x90\x90\x14\xfb\xf8\x97\x1a\x94\xcb\x
(the binary code of the file is larger than this)
Wait what ? Binary is a lot of 1's and 0's. Okay, perhaps this is hexadecimal (a format that makes binary more readable for humans) ?
Nope, this is certainly not hexadecimals either! What is going on ?
What am I dealing with here ?
How can I convert it into hexadecimals or something more readable than this ? (as you might guess I am quite new to this. Please be nice.)
EDIT:
file = open('image.png', 'rb').read()
file[0]
#output: 224
file[1]
#output: 62
How come the output of the first "character" (the first index) be 224? Shouldn't it be \xe0?

When you read a binary data and try to print it, the binary data is tried and decoded into utf-8 by default. Thats why you see strange characters. The below code formats as hex before printing. You should see hexadecimals data with out any stage characters using the below code.
for i in open('image.png', 'rb').read():
print(r'{0:#x}'.format(i), end=' ')

Related

How to read binary data and print in binary or hexadecimal format?

I am working with binary data. I have a file containing 2KB of binary data. I have used the following code to read the file and then print it. I have also tried to view the file contents using hexdump in the terminal. I am getting different outputs 1 and 2 (shown in attached screenshots) of the same file in python and hexdump. I am assuming it might be due to the encoding scheme used by python? I am very naive about working with binary data. Can anyone kindly check it and let me know the reasons for this? I also want to know if that's the correct way of reading a large binary file?
print("First File \n");
f1 = open("/data/SRAMDUMP/dataFiles/first.bin","rb")
num1 = list(f1.read())
print(num1)
f1.close()
I am assuming it might be due to the encoding scheme used by python?
There is no "encoding scheme" hexdump formats binary data as hex (two nibbles per byte), you've converted the binary contents of the file to a list which yields a list of integers (since that's what bytes are in python).
If you want to convert bytes to printable hex in Python, use the bytes.hex method. If you want something similar to hexdump, you'll need to take care of slicing, spacing and carriage-return-ing.
A slow version would simply read the file 2 bytes per 2 bytes, hex them, print them, then linebreak every 16 bytes. Python 3.8 adds formatting facilities to bytes.hex() meaning you can even more easily read bytes 16 by 16 with a separator every two, though that doesn't exactly match hexdump's format:
f = open(sys.argv[1], 'rb')
it = iter(functools.partial(f.read, 16), '')
for i, b in enumerate(it):
print(f'{16*i:07x} {b.hex(" ", 2)}')
Also beware that hexdump follows the platform's endianness by default which is... rarely what you want, and will not match your Python output. hexdump -C prints bytes in the file order.

How to read a text file with base-N input, and convert it to hex (and eventually ascii)?

I'm very new to Python and I'm looking for help creating a program that will execute the following algorithm. It's purposely simple, and it should be noted that the data is in a stream, so I can't (or at least, I don't think I can) just open the text file and convert it using a single function. I'm cranking away at it to learn the language and options, but would like to see how some experts would tackle this problem. Doesn't need to be user friendly and I'd like files output at each step so I can see the output at each step.
Here's the algorithm I'm recommending:
Open file "base-n.txt"
For each line in file
Remove carriage returns
Write line to "Clean File" *#to create a single stream of characters#*
Open file "Clean File"
For each line in file
Read the first x characters *#I presume x depends on n in base n#*
Convert the characters from base n to base 16
Write the characters to "Output file"
Open file Output File
For each line in file
Convert line to ASCII
Print ASCII line
End
The files are not large... usually just a few hundred lines of base n information. For example, the below is an example of the base-5 text.
0322040104130344042104140401011204310421011203430342043004010112020301130020
0301042104240401041401120410042204300432041401120400042104130421042401120430
0410043101120342041404010431013401120344042104200430040103440431040104310432
0424011203420400041004220410043003440410042004030112040104130410043101410112
0233043204100430011203440421042004030432040101120413041003430401042404210020
0430040104140134011204200421042001120344042104200433034204130413041004300112
0411043204300431042101120413034203440410042004100342011203420141011203130432
0430042204010420040004100430043004010112042304320410043001120413034203440432
0430011204200421042001120413041004030432041303420112040003420422041003430432
0430002004210424042003420424040101410112031004240421041004200112040003420422
Thanks in advance for the help. I'm looking forward to getting much better at Python, but have an real need for this algorithm in the short term.
I left a comment suggesting the hex() function. Here is an example for decimal to hex:
while True:
print("Enter 'x' for exit.")
dec = input("Enter number in Decimal Format: ")
if dec == 'x':
break
else:
decimal = int(dec)
print(decimal,"in Hexadecimal =",hex(decimal),"\n")

Reading bytes from wave file python

I'm working on a little audio project and part of it requires using wave files and flac files. Im trying to figure out how to read the metadata in each and how to add tags manually. I'm having trouble figuring out how to read the bytes as they are.
I have been referencing this page and a couple others to see the full format of a Wave file however for some wave files I get some discrepancies. I want to be able to see the hexadecimal bytes in order to see what differences are occurring.
Using simply open('fname', 'rb') and read, only returns the bytes as strings. Using struct.unpack has worked for some wave files however it is limited to printing as strings, ints, or shorts and I can't see exactly what is going wrong when I use it. Is there any other way I can read this file in hex?
Thanks
I assume that you just want to display the content of a binary file in hexadecimal. First, you do not need to use Python for that, as some editors to it natively, for example vim.
Now assuming you have a string that you got by reading a file, you can easily change it to a list of hexadecimal values:
with open('fname', 'rb') as fd: # open the file
data = rd.read(16) # read 16 bytes from it
h = [ hex(ord(b)) for b in data] # convert the bytes to their hex value
print (h) # prints a list of hexadecimal codes of the read bytes

strip out binary data from text file in python

I have a text file that contains some binary data. When I read the file, using Python 3, in text mode I get an UniCodeDecodeError (codec can't decode byte...) with the following lines of code:
fo = open('myfile.txt, 'r')
for line in inFile:
How can I remove the binary data from my file. I have a header that is printed just before each binary data (in this case it is shown as Data Block). For example, my file looks like such where I want to remove the çºí?¼Èדñdí:
myfile.txt:
ABCDEFGH
123456
Data Block 11
çºí?¼Èדñdí
XYZ123
The result I want is for myfile.txt to look like this:
ABCDEFGH
123456
Data Block 11
XYZ123
This is difficult, because "binary" blobs may contain valid characters or character sequences. And if you're using a file that has "text" using multi-byte encoding, forget about it.
If you know the "text" in your file only contains single-byte characters, one approach would be to read the file in as bytes, then use something like
encode('ascii', error='ignore')
This effectively strips non-ascii characters out of the output, but if you were to do this on your file, you'd get:
ABCDEFGH
123456
Data Block
?d
XYZ123
Note the second to last line -- valid ascii characters were found in the blob and treated as "text".
You may start with a solution like that, and fine-tune it (if possible) to meet your needs. Maybe the blobs occur by themselves on lines so that if a line has any non-ascii characters, throw out the entire line completely. Maybe you can look at the blobs and try to grok some structure it has. Maybe you just settle for having random lines of partial characters in there and handle them somehow later. It's kind of application-specific at that point.
Here's the code I used to produce that output from your sample input:
def strip_nonascii(b):
return b.decode('ascii', errors='ignore')
with open('garbled.txt', 'rb') as f:
for line in f:
print(strip_nonascii(line), end='')
If you also have footer after binary data (like you are having header), try to replace everything between header/footer with nothing with regexp?

Outputting string to binary file doesn't work

For some reason, I cannot get a simple string to be output to a binary file with python.
Here is my code:
strin = bytes(strin, '3DFILE')
dataH = struct.pack('s', strin)
outFile.write(dataH)
I'm trying to write a 3D model exporter for a game I am making with blender. can someone please help me out here, or give me an example? I get the error that string is not defined.
Python 3 strings are sequences of unicode characters. The characters are abstract, and they have no binary representation until you say what encoding should be used.
If you have binary data, you can write it to the binary file (opened with binary mode like outFile = open(filename, 'wb') ... outFile.close()) without problem. However, writing binary data to the file opened in text mode cannot be done. It was different in Python 2 where strings were actually sequences of bytes and even the open text file object did not care.

Categories

Resources