Let's say I have the following ELF file in python:
>>> data=open('file','rb').read()
>>> data
b'\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00>\x00\x01\x00\x00\x00x\x00#\x00\x00\x00\x00\x00#\x00\x00\x00\x00\x00\x00\x00X\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00#\x008\x00\x01\x00#\x00\x05\x00\x04\x00\x01\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00#\x00\x00\x00\x00\x00\x00\x00#\x00\x00\x00\x00\x00\x84\x00\x00\x00\x00\x00\x00\x00\x84\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\xbf\x03\x00\x00\x00\xb8<\x00\x00\x00\x0f\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x01\x00x\x00#\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x10\x00\x01\x00\x84\x00`\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\r\x00\x00\x00\x10\x00\x01\x00\x84\x00`\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x14\x00\x00\x00\x10\x00\x01\x00\x88\x00`\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00__bss_start\x00_edata\x00_end\x00\x00.symtab\x00.strtab\x00.shstrtab\x00.text\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1b\x00\x00\x00\x01\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x00x\x00#\x00\x00\x00\x00\x00x\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x88\x00\x00\x00\x00\x00\x00\x00\x90\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x18\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18\x01\x00\x00\x00\x00\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x11\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x001\x01\x00\x00\x00\x00\x00\x00!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
The first 7 bytes are, in hex:
0x7F 0x45 ("E") 0x4c ("L") 0x46 ("F") 0x02 0x01 0x01
How would I change the 5th byte to 1 and save the file? Something like:
data[5]=1 # gives a 'bytes' assignment error
open('newfile','wb').write(data)
Convert it to a bytearray, which is a mutable sequence of bytes, for modifying and re-saving the file:
The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has
# open the file in binary mode and convert to a byte-array
with open('file', 'rb') as f:
barray=bytearray(r.read())
# modify the byte in the array
barray[4]=1
# write-out in binary mode
with open('newfile', 'wb') as f:
f.write(barray)
Related
I would like to scan through data files from GPS receiver byte-wise (actually it will be a continuous flow, not want to test the code with offline data). If find a match, then check the next 2 bytes for the 'length' and get the next 2 bytes and shift 2 bits(not byte) to the right, etc. I didn't handle binary before, so stuck in a simple task. I could read the binary file byte-by-byte, but can not find a way to match by desired pattern (i.e. D3).
with open("COM6_200417.ubx", "rb") as f:
byte = f.read(1) # read 1-byte at a time
while byte != b"":
# Do stuff with byte.
byte = f.read(1)
print(byte)
The output file is:
b'\x82'
b'\xc2'
b'\xe3'
b'\xb8'
b'\xe0'
b'\x00'
b'#'
b'\x13'
b'\x05'
b'!'
b'\xd3'
b'\x00'
b'\x13'
....
how to check if that byte is == '\xd3'? (D3)
also would like to know how to shift bit-wise, as I need to check decimal value consisting of 6 bits
(1-byte and next byte's first 2-bits). Considering, taking 2-bytes(8-bits) and then 2-bit right-shift
to get 6-bits. Is it possible in python? Any improvement/addition/changes are very much appreciated.
ps. can I get rid of that pesky 'b' from the front? but if ignoring it does not affect then no problem though.
Thanks in advance.
'That byte' is represented with a b'' in front, indicating that it is a byte object. To get rid of it, you can convert it to an int:
thatbyte = b'\xd3'
byteint = thatbyte[0] # or
int.from_bytes(thatbyte, 'big') # 'big' or 'little' endian, which results in the same when converting a single byte
To compare, you can do:
thatbyte == b'\xd3'
Thus compare a byte object with another byte object.
The shift << operator works on int only
To convert an int back to bytes (assuming it is [0..255]) you can use:
bytes([byteint]) # note the extra brackets!
And as for improvements, I would suggest to read the whole binary file at once:
with open("COM6_200417.ubx", "rb") as f:
allbytes = f.read() # read all
for val in allbytes:
# Do stuff with val, val is int !!!
print(bytes([val]))
I just finished creating a huffman compression algorithm . I converted my compressed text from a string to a byte array with bytearray(). Im attempting to decompress my huffman algorithm. My only concern though is that i cannot convert my byte array back into a string. Is there any built in function i could use to convert my byte array (with a variable) back into a string? If not is there a better method to convert my compressed string to something else? I attempted to use byte_array.decode() and I get this:
print("Index: ", Index) # The Index
# Subsituting text to our compressed index
for x in range(len(TextTest)):
TextTest[x]=Index[TextTest[x]]
NewText=''.join(TextTest)
# print(NewText)
# NewText=int(NewText)
byte_array = bytearray() # Converts the compressed string text to bytes
for i in range(0, len(NewText), 8):
byte_array.append(int(NewText[i:i + 8], 2))
NewSize = ("Compressed file Size:",sys.getsizeof(byte_array),'bytes')
print(byte_array)
print(byte_array)
print(NewSize)
x=bytes(byte_array)
x.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte
You can use .decode('ascii') (leave empty for utf-8).
>>> print(bytearray("abcd", 'utf-8').decode())
abcd
Source : Convert bytes to a string?
I have a bmp file that I read in my Python program. Once I have read in the bytes, I want to do bit-wise operations on each byte I read in. My program is:
with open("ship.bmp", "rb") as f:
byte = f.read(1)
while byte != b"":
# Do stuff with byte.
byte = f.read(1)
print(byte)
output:
b'\xfe'
I was wondering how I can do manipulation on that? I.e convert it to bits. Some general pointers would be good. I lack experience with Python, so any help would be appreciated!
bytes objects yield integers from 0 through 255 inclusive when indexed. So, just perform the bit manipulation on the result of indexing.
3>> b'\xfe'[0]
254
3>> b'\xfe'[0] ^ 0x55
171
file.read(1) constructs a length 1 bytes objects, which is a bit overkill when you want the byte as an integer. To access each byte as an integer the following would be more succinct, and have the benefit of using a for loop.
with open("ship.bmp", "rb") as f:
byte_data = f.read()
for byte in byte_data:
# do stuff with byte. eg.
result = byte & 0x2
...
In Python, when I try to read in an executable file with 'rb', instead of getting the binary values I expected (0010001 etc.), I'm getting a series of letters and symbols that I do not know what to do with.
Ex: ???}????l?S??????V?d?\?hG???8?O=(A).e??????B??$????????: ???Z?C'???|lP#.\P?!??9KRI??{F?AB???5!qtWI??8???!ᢉ?]?zъeF?̀z??/?n??
How would I access the binary numbers of a file in Python?
Any suggestions or help would be appreciated. Thank you in advance.
That is the binary. They are stored as bytes, and when you print them, they are interpreted as ASCII characters.
You can use the bin() function and the ord() function to see the actual binary codes.
for value in enumerate(data):
print bin(ord(value))
Byte sequences in Python are represented using strings. The series of letters and symbols that you see when you print out a byte sequence is merely a printable representation of bytes that the string contains. To make use of this data, you usually manipulate it in some way to obtain a more useful representation.
You can use ord(x) or bin(x) to obtain decimal and binary representations, respectively:
>>> f = open('/tmp/IMG_5982.JPG', 'rb')
>>> data = f.read(10)
>>> data
'\x00\x00II*\x00\x08\x00\x00\x00'
>>> data[2]
'I'
>>> ord(data[2])
73
>>> hex(ord(data[2]))
'0x49'
>>> bin(ord(data[2]))
'0b1001001'
>>> f.close()
The 'b' flag that you pass to open() does not tell Python anything about how to represent the file contents. From the docs:
Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
Unless you just want to look at what the binary data from the file looks like, Mark Pilgrim's book, Dive Into Python, has an example of working with binary file formats. The example shows how you can read IDv1 tags from an MP3 file. The book's website seems to be down, so I'm linking to a mirror.
Each character in the string is the ASCII representation of a binary byte. If you want it as a string of zeros and ones then you can convert each byte to an integer, format it as 8 binary digits and join everything together:
>>> s = "hello world"
>>> ''.join("{0:08b}".format(ord(x)) for x in s)
'0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
Depending on if you really need to analyse / manipulate things at the binary level an external module such as bitstring could be helpful. Check out the docs; to just get the binary interpretation use something like:
>>> f = open('somefile', 'rb')
>>> b = bitstring.Bits(f)
>>> b.bin
0100100101001001...
Use ord(x) to get the integer value of each byte.
>>> with open('settings.dat', 'rb') as file:
... data = file.read()
...
>>> for index, value in enumerate(data):
... print '0x%08x 0x%02x' % (index, ord(value))
...
0x00000000 0x28
0x00000001 0x64
0x00000002 0x70
0x00000003 0x30
0x00000004 0x0d
0x00000005 0x0a
0x00000006 0x53
0x00000007 0x27
0x00000008 0x4d
0x00000009 0x41
0x0000000a 0x49
0x0000000b 0x4e
0x0000000c 0x5f
0x0000000d 0x57
0x0000000e 0x49
0x0000000f 0x4e
If you realy want to convert the binaray bytes to a stream of bits, you have to remove the first two chars ('0b') from the output of bin() and reverse the result:
with open("settings.dat", "rb") as fp:
print "".join( (bin(ord(c))[2:][::-1]).ljust(8,"0") for c in fp.read() )
If you use Python prior to 2.6, you have no bin() function.
Suppose I have a number like 824 and I write it to a text file using python. In the text file, it will take 3 bytes space. However, If i represent it using bits, it has the following representation 0000001100111000 which is 2 bytes (16 bits). I was wondering how can I write bits to file in python, not bytes. If I can do that, the size of the file will be 2 bytes, not 3.
Please provide code. I am using python 2.6. Also, I do not want to use any external modules that do not come with the basic installation
I tried below and gave me 12 bytes!
a =824;
c=bin(a)
handle = open('try1.txt','wb')
handle.write(c)
handle.close()
The struct module is what you want. From your example, 824 = 0000001100111000 binary or 0338 hexadecimal. This is the two bytes 03H and 38H. struct.pack will convert 824 to a string of these two bytes, but you also have to decide little-endian (write the 38H first) or big-endian (write the 03H first).
Example
>>> import struct
>>> struct.pack('>H',824) # big-endian
'\x038'
>>> struct.pack('<H',824) # little-endian
'8\x03'
>>> struct.pack('H',824) # Use system default
'8\x03'
struct returns a two-byte string. the '\x##' notation means (a byte with hexadecimal value ##). the '8' is an ASCII '8' (value 38H). Python byte strings use ASCII for printable characters, and \x## notation for unprintable characters.
Below is an example writing and reading binary data to a file. You should always specify the endian-ness when writing to and reading from a binary file, in case it is read on a system with a different endian default:
import struct
a = 824
bin_data = struct.pack('<H',824)
print 'bin_data length:',len(bin_data)
with open('data.bin','wb') as f:
f.write(bin_data)
with open('data.bin','rb') as f:
bin_data = f.read()
print 'Value from file:',struct.unpack('<H',bin_data)[0]
print 'bin_data representation:',repr(bin_data)
for i,c in enumerate(bin_data):
print 'Byte {0} as binary: {1:08b}'.format(i,ord(c))
Output
bin_data length: 2
Value from file: 824
bin_data representation: '8\x03'
Byte 0 as binary: 00111000
Byte 1 as binary: 00000011
Have a look at struct:
>>> struct.pack("h", 824)
'8\x03'
I think what you want is to open the file in binary mode:
open("file.bla", "wb")
However, this will write an integer to the file, which will probably be 4 bytes in size. I do not know if Python has a 2 byte integer type. But you can circumvent that by encoding 2 16 bit number in one 32 bit number:
a = 824
b = 1234
c = (a << 16) + b