In Python, when I try to read in an executable file with 'rb', instead of getting the binary values I expected (0010001 etc.), I'm getting a series of letters and symbols that I do not know what to do with.
Ex: ???}????l?S??????V?d?\?hG???8?O=(A).e??????B??$????????: ???Z?C'???|lP#.\P?!??9KRI??{F?AB???5!qtWI??8???!ᢉ?]?zъeF?̀z??/?n??
How would I access the binary numbers of a file in Python?
Any suggestions or help would be appreciated. Thank you in advance.
That is the binary. They are stored as bytes, and when you print them, they are interpreted as ASCII characters.
You can use the bin() function and the ord() function to see the actual binary codes.
for value in enumerate(data):
print bin(ord(value))
Byte sequences in Python are represented using strings. The series of letters and symbols that you see when you print out a byte sequence is merely a printable representation of bytes that the string contains. To make use of this data, you usually manipulate it in some way to obtain a more useful representation.
You can use ord(x) or bin(x) to obtain decimal and binary representations, respectively:
>>> f = open('/tmp/IMG_5982.JPG', 'rb')
>>> data = f.read(10)
>>> data
'\x00\x00II*\x00\x08\x00\x00\x00'
>>> data[2]
'I'
>>> ord(data[2])
73
>>> hex(ord(data[2]))
'0x49'
>>> bin(ord(data[2]))
'0b1001001'
>>> f.close()
The 'b' flag that you pass to open() does not tell Python anything about how to represent the file contents. From the docs:
Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
Unless you just want to look at what the binary data from the file looks like, Mark Pilgrim's book, Dive Into Python, has an example of working with binary file formats. The example shows how you can read IDv1 tags from an MP3 file. The book's website seems to be down, so I'm linking to a mirror.
Each character in the string is the ASCII representation of a binary byte. If you want it as a string of zeros and ones then you can convert each byte to an integer, format it as 8 binary digits and join everything together:
>>> s = "hello world"
>>> ''.join("{0:08b}".format(ord(x)) for x in s)
'0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
Depending on if you really need to analyse / manipulate things at the binary level an external module such as bitstring could be helpful. Check out the docs; to just get the binary interpretation use something like:
>>> f = open('somefile', 'rb')
>>> b = bitstring.Bits(f)
>>> b.bin
0100100101001001...
Use ord(x) to get the integer value of each byte.
>>> with open('settings.dat', 'rb') as file:
... data = file.read()
...
>>> for index, value in enumerate(data):
... print '0x%08x 0x%02x' % (index, ord(value))
...
0x00000000 0x28
0x00000001 0x64
0x00000002 0x70
0x00000003 0x30
0x00000004 0x0d
0x00000005 0x0a
0x00000006 0x53
0x00000007 0x27
0x00000008 0x4d
0x00000009 0x41
0x0000000a 0x49
0x0000000b 0x4e
0x0000000c 0x5f
0x0000000d 0x57
0x0000000e 0x49
0x0000000f 0x4e
If you realy want to convert the binaray bytes to a stream of bits, you have to remove the first two chars ('0b') from the output of bin() and reverse the result:
with open("settings.dat", "rb") as fp:
print "".join( (bin(ord(c))[2:][::-1]).ljust(8,"0") for c in fp.read() )
If you use Python prior to 2.6, you have no bin() function.
Related
I would like to scan through data files from GPS receiver byte-wise (actually it will be a continuous flow, not want to test the code with offline data). If find a match, then check the next 2 bytes for the 'length' and get the next 2 bytes and shift 2 bits(not byte) to the right, etc. I didn't handle binary before, so stuck in a simple task. I could read the binary file byte-by-byte, but can not find a way to match by desired pattern (i.e. D3).
with open("COM6_200417.ubx", "rb") as f:
byte = f.read(1) # read 1-byte at a time
while byte != b"":
# Do stuff with byte.
byte = f.read(1)
print(byte)
The output file is:
b'\x82'
b'\xc2'
b'\xe3'
b'\xb8'
b'\xe0'
b'\x00'
b'#'
b'\x13'
b'\x05'
b'!'
b'\xd3'
b'\x00'
b'\x13'
....
how to check if that byte is == '\xd3'? (D3)
also would like to know how to shift bit-wise, as I need to check decimal value consisting of 6 bits
(1-byte and next byte's first 2-bits). Considering, taking 2-bytes(8-bits) and then 2-bit right-shift
to get 6-bits. Is it possible in python? Any improvement/addition/changes are very much appreciated.
ps. can I get rid of that pesky 'b' from the front? but if ignoring it does not affect then no problem though.
Thanks in advance.
'That byte' is represented with a b'' in front, indicating that it is a byte object. To get rid of it, you can convert it to an int:
thatbyte = b'\xd3'
byteint = thatbyte[0] # or
int.from_bytes(thatbyte, 'big') # 'big' or 'little' endian, which results in the same when converting a single byte
To compare, you can do:
thatbyte == b'\xd3'
Thus compare a byte object with another byte object.
The shift << operator works on int only
To convert an int back to bytes (assuming it is [0..255]) you can use:
bytes([byteint]) # note the extra brackets!
And as for improvements, I would suggest to read the whole binary file at once:
with open("COM6_200417.ubx", "rb") as f:
allbytes = f.read() # read all
for val in allbytes:
# Do stuff with val, val is int !!!
print(bytes([val]))
Is there a simple way to, in Python, read a file's hexadecimal data into a list, say hex?
So hex would be this:
hex = ['AA','CD','FF','0F']
I don't want to have to read into a string, then split. This is memory intensive for large files.
s = "Hello"
hex_list = ["{:02x}".format(ord(c)) for c in s]
Output
['48', '65', '6c', '6c', '6f']
Just change s to open(filename).read() and you should be good.
with open('/path/to/some/file', 'r') as fp:
hex_list = ["{:02x}".format(ord(c)) for c in fp.read()]
Or, if you do not want to keep the whole list in memory at once for large files.
hex_list = ("{:02x}".format(ord(c)) for c in fp.read())
and to get the values, keep calling
next(hex_list)
to get all the remaining values from the generator
list(hex_list)
Using Python 3, let's assume the input file contains the sample bytes you show. For example, we can create it like this
>>> inp = bytes((170,12*16+13,255,15)) # i.e. b'\xaa\xcd\xff\x0f'
>>> with open(filename,'wb') as f:
... f.write(inp)
Now, given we want the hex representation of each byte in the input file, it would be nice to open the file in binary mode, without trying to interpret its contents as characters/strings (or we might trip on the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 0: invalid start byte)
>>> with open(filename,'rb') as f:
... buff = f.read() # it reads the whole file into memory
...
>>> buff
b'\xaa\xcd\xff\x0f'
>>> out_hex = ['{:02X}'.format(b) for b in buff]
>>> out_hex
['AA', 'CD', 'FF', '0F']
If the file is large, we might want to read one character at a time or in chunks. For that purpose I recommend to read this Q&A
Be aware that for viewing hexadecimal dumps of files, there are utilities available on most operating systems. If all you want to do is hex dump the file, consider one of these programs:
od (octal dump, which has a -x or -t x option)
hexdump
xd utility available under windows
Online hex dump tools, such as this one.
I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.
For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.
Clarification:
My string looks something like this:
binary_string = "001011010110000010010"
It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.
To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module
import struct
#...
with open('file.dat', 'wb') as f:
if isinstance(value, int):
f.write(struct.pack('i', value)) # write an int
elif isinstance(value, str):
f.write(value) # write a string
else:
raise TypeError('Can only write str or int')
However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s
>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'
but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:
>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'
Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.
The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.
from array import *
bin_array = array('B')
bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))
with file('binary.mydata', 'wb') as f:
bin_array.tofile(f)
I want that pattern of ones and zeros to be written to a file.
If you mean you want to write a bitstream from a string to a file, you'll need something like this...
from cStringIO import StringIO
s = "001011010110000010010"
sio = StringIO(s)
f = open('outfile', 'wb')
while 1:
# Grab the next 8 bits
b = sio.read(8)
# Bail if we hit EOF
if not b:
break
# If we got fewer than 8 bits, pad with zeroes on the right
if len(b) < 8:
b = b + '0' * (8 - len(b))
# Convert to int
i = int(b, 2)
# Convert to char
c = chr(i)
# Write
f.write(c)
f.close()
...for which xxd -b outfile shows...
0000000: 00101101 01100000 10010000 -`.
Brief example:
my_number = 1234
with open('myfile', 'wb') as file_handle:
file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
my_number_back = struct.unpack('i', file_handle.read())[0]
Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.
It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.
Python 3.4 example
original_string = '001011010110000010010'
# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]
# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]
with open('byte.dat', 'wb') as f:
f.write(bytearray(byte_list)) # convert to bytearray before writing
Contents of byte.dat:
hex: 2D 60 12
binary (by 8 bits): 00101101 01100000 00010010
binary (by 3 bits): 001 011 010 110 000 000 010 010
^^ ^ (Note extra bits)
Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.
I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.
For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.
Clarification:
My string looks something like this:
binary_string = "001011010110000010010"
It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.
To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module
import struct
#...
with open('file.dat', 'wb') as f:
if isinstance(value, int):
f.write(struct.pack('i', value)) # write an int
elif isinstance(value, str):
f.write(value) # write a string
else:
raise TypeError('Can only write str or int')
However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s
>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'
but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:
>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'
Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.
The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.
from array import *
bin_array = array('B')
bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))
with file('binary.mydata', 'wb') as f:
bin_array.tofile(f)
I want that pattern of ones and zeros to be written to a file.
If you mean you want to write a bitstream from a string to a file, you'll need something like this...
from cStringIO import StringIO
s = "001011010110000010010"
sio = StringIO(s)
f = open('outfile', 'wb')
while 1:
# Grab the next 8 bits
b = sio.read(8)
# Bail if we hit EOF
if not b:
break
# If we got fewer than 8 bits, pad with zeroes on the right
if len(b) < 8:
b = b + '0' * (8 - len(b))
# Convert to int
i = int(b, 2)
# Convert to char
c = chr(i)
# Write
f.write(c)
f.close()
...for which xxd -b outfile shows...
0000000: 00101101 01100000 10010000 -`.
Brief example:
my_number = 1234
with open('myfile', 'wb') as file_handle:
file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
my_number_back = struct.unpack('i', file_handle.read())[0]
Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.
It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.
Python 3.4 example
original_string = '001011010110000010010'
# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]
# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]
with open('byte.dat', 'wb') as f:
f.write(bytearray(byte_list)) # convert to bytearray before writing
Contents of byte.dat:
hex: 2D 60 12
binary (by 8 bits): 00101101 01100000 00010010
binary (by 3 bits): 001 011 010 110 000 000 010 010
^^ ^ (Note extra bits)
Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.
Suppose I have a number like 824 and I write it to a text file using python. In the text file, it will take 3 bytes space. However, If i represent it using bits, it has the following representation 0000001100111000 which is 2 bytes (16 bits). I was wondering how can I write bits to file in python, not bytes. If I can do that, the size of the file will be 2 bytes, not 3.
Please provide code. I am using python 2.6. Also, I do not want to use any external modules that do not come with the basic installation
I tried below and gave me 12 bytes!
a =824;
c=bin(a)
handle = open('try1.txt','wb')
handle.write(c)
handle.close()
The struct module is what you want. From your example, 824 = 0000001100111000 binary or 0338 hexadecimal. This is the two bytes 03H and 38H. struct.pack will convert 824 to a string of these two bytes, but you also have to decide little-endian (write the 38H first) or big-endian (write the 03H first).
Example
>>> import struct
>>> struct.pack('>H',824) # big-endian
'\x038'
>>> struct.pack('<H',824) # little-endian
'8\x03'
>>> struct.pack('H',824) # Use system default
'8\x03'
struct returns a two-byte string. the '\x##' notation means (a byte with hexadecimal value ##). the '8' is an ASCII '8' (value 38H). Python byte strings use ASCII for printable characters, and \x## notation for unprintable characters.
Below is an example writing and reading binary data to a file. You should always specify the endian-ness when writing to and reading from a binary file, in case it is read on a system with a different endian default:
import struct
a = 824
bin_data = struct.pack('<H',824)
print 'bin_data length:',len(bin_data)
with open('data.bin','wb') as f:
f.write(bin_data)
with open('data.bin','rb') as f:
bin_data = f.read()
print 'Value from file:',struct.unpack('<H',bin_data)[0]
print 'bin_data representation:',repr(bin_data)
for i,c in enumerate(bin_data):
print 'Byte {0} as binary: {1:08b}'.format(i,ord(c))
Output
bin_data length: 2
Value from file: 824
bin_data representation: '8\x03'
Byte 0 as binary: 00111000
Byte 1 as binary: 00000011
Have a look at struct:
>>> struct.pack("h", 824)
'8\x03'
I think what you want is to open the file in binary mode:
open("file.bla", "wb")
However, this will write an integer to the file, which will probably be 4 bytes in size. I do not know if Python has a 2 byte integer type. But you can circumvent that by encoding 2 16 bit number in one 32 bit number:
a = 824
b = 1234
c = (a << 16) + b