Read string from binary file - python

I want to read bytes 1,2 and 3 from a file. I know it corresponds to a string (in this case it's ELF of a Linux binary header)
Following examples I could find on the net I came up with this:
with open('hello', 'rb') as f:
f.seek(1)
bytes = f.read(3)
string = struct.unpack('s', bytes)
print st
Looking at the official documentation of struct it seems that passing s as argument should allow me to read a string.
I get the error:
st = struct.unpack('s', bytes)
struct.error: unpack requires a string argument of length 1
EDIT: Using Python 2.7

In your special case, it is enough to just check
if bytes == 'ELF':
to test all three bytes in one step to be the three characters E, L and F.
But also if you want to check the numerical values, you do not need to unpack anything here. Just use ord(bytes[i]) (with i in 0, 1, 2) to get the byte values of the three bytes.
Alternatively you can use
byte_values = struct.unpack('bbb', bytes)
to get a tuple of the three bytes. You can also unpack that tuple on the fly in case the bytes have nameable semantics like this:
width, height, depth = struct.unpack('bbb', bytes)
Use 'BBB' instead of 'bbb' in case your byte values shall be unsigned.

In Python 2, read returns a string; in the sense "string of bytes". To get a single byte, use bytes[i], it will return another string but with a single byte. If you need the numeric value of a byte, use ord: ord(bytes[i]). Finally, to get numeric values for all bytes use map(ord, bytes).
In [4]: s = "foo"
In [5]: s[0]
Out[5]: 'f'
In [6]: ord(s[0])
Out[6]: 102
In [7]: map(ord, s)
Out[7]: [102, 111, 111]

Related

Decoding Error: Int too big to convert while decrypting

I have encoded a string to integer in the following way in python:
b = bytearray()
b.extend(input_number_or_text.encode('ascii'))
input_number_or_text = int.from_bytes(b,byteorder='big', signed=False)
I am encrypting this integer to get a new value and subsequently decrypting to get back the original integer.
Now how do I get back the string from the integer
I have tried the following method for decryption:
decrypted_data.to_bytes(1,byteorder='big').decode('ascii')
but I get int too big to convert error.
How to fix this problem?
You told it the int should be convertable to a length 1 byte string. If it's longer, that won't work. You can remember the length, or you can guess at it:
num_bytes = (decrypted_data.bit_length() + 7) // 8
decrypted_data.to_bytes(num_bytes, byteorder='big').decode('ascii')
Adding 7 and floor-dividing by 8 ensures enough bytes for the data. -(-decrypted_data.bit_length() // 8) also works (and is trivially faster on Python), but is a bit more magical looking.
The byte representation of an integer is different than a string.
For example - 1 , '1', 1.0 all look different when looking at the byte representation.
From the code you supply -
b.extend(input_number_or_text.encode('ascii'))
and int.from_bytes(b,byteorder='big', signed=False)
Seems like your encoding a string of a number, and trying to decode it as a int.
See the next example:
In [3]: b = bytearray()
In [4]: a = '1'
In [5]: b.extend(a.encode('ascii'))
In [6]: int.from_bytes(b,byteorder='big',signed=False)
Out[6]: 49
If you are encoding a string, you should first decode a string, and then convert to int.
In [1]: b = bytearray()
In [2]: a = '123'
In [3]: b.extend(a.encode('ascii'))
In [4]: decoded = int(b.decode('ascii'))
In [5]: decoded
Out[5]: 123

Python array[0:1] not the same as array[0]

I'm using Python to split a string of 2 bytes b'\x01\x00'. The string of bytes is stored in a variable called flags.
Why when I say flags[0] do I get b'\x00' but when I say flags[0:1] I get the expected answer of b'\x01'.
Should both of these operations not be exactly the same?
What I did:
>>> flags = b'\x01\x00'
>>> flags[0:1]
b'\x01'
>>> bytes(flags[0])
b'\x00'
In Python 3, bytes is a sequence type containing integers (each in the range 0 - 255) so indexing to a specific index gives you an integer.
And just like slicing a list produces a new list object for the slice, so does slicing a bytes object produce a new bytes instance. And the representation of a bytes instance tries to show you a b'...' literal syntax with the integers represented as either printable ASCII characters or an applicable escape sequence when the byte isn't printable. All this is great for developing but may hide the fact that bytes are really a sequence of integers.
However, you will still get the same piece of information; flags[0:1] is a one-byte long bytes value with the \x01 byte in it, and flags[0] will give you the integer 1:
>>> flags = b'\x01\x00'
>>> flags[0]
1
>>> flags[0:1]
b'\x01'
What you really did was not use flags[0], you used bytes(flags[0]) instead. Passing in a single integer to the bytes() type creates a new bytes object of the specified length, pre-filled with \x00 bytes:
>>> flags[0]
1
>>> bytes(1)
b'\x00'
Since flags[0] produces the integer 1, you told bytes() to return a new bytes value of length 1, filled with \x00 bytes.
From the bytes documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256.
[...]
In addition to the literal forms, bytes objects can be created in a number of other ways:
A zero-filled bytes object of a specified length: bytes(10)
Bold emphasis mine.
If you wanted to create a new bytes object with that one byte in it, you'll need to put the integer value in a list first:
>>> bytes([flags[0]])
b'\x01'
Yes, you should get the same thing. In both cases b'\x01'. flags is probably not what you think it is.
>>> flags = b'\x01\x00'
>>> flags[0]
'\x01'
>>> flags[0:1]
'\x01'

struct.unpack 6 bytes into short and int fails. Why?

s = '\x01\x00\x02\x00\x00\x00'
struct.unpack('hi',s)
I expect to get (1,2), but instead get the error:
error: unpack requires a string argument of length 8
If I perform the two unpacks separately it works:
myshort = struct.unpack('h',s[:2])
myint = struct.unpack('i',s[2:])
Also, interestingly, it will accept it if the format string is 'ih' instead of 'hi'.
What am I missing?
This is because of C structure alignment. If you actually want your data items to remain unaligned, prefix a = sign before the formatted string
>>> s = '\x01\x00\x02\x00\x00\x00'
>>> struct.unpack('=hi',s)
(1, 2)
Refer the documentation 7.3.2.1. Byte Order, Size, and Alignment

Python read a binary file and decode

I am quite new in python and I need to solve this simple problem. Already there are several similar questions but still I cannot solve it.
I need to read a binary file, which is composed by several blocks of bytes. For example the header is composed by 6 bytes and I would like to extract those 6 bytes and transform ins sequence of binary characters like 000100110 011001 for example.
navatt_dir='C:/PROCESSING/navatt_read/'
navatt_filename='OSPS_FRMT_NAVATT____20130621T100954_00296_caseB.bin'
navatt_path=navatt_dir+navatt_filename
navatt_file=open(navatt_path, 'rb')
header=list(navatt_file.read(6))
print header
As result of the list i have the following
%run C:/PROCESSING/navatt_read/navat_read.py
['\t', 'i', '\xc0', '\x00', '\x00', 't']
which is not what i want.
I would like also to read a particular value in the binary file knowing the position and the length, without reading all the file. IS it possible
thanks
ByteArray
A bytearray is a mutable sequence of bytes (Integers where 0 ≤ x ≤ 255). You can construct a bytearray from a string (If it is not a byte-string, you will have to provide encoding), an iterable of byte-sized integers, or an object with a buffer interface. You can of course just build it manually as well.
An example using a byte-string:
string = b'DFH'
b = bytearray(string)
# Print it as a string
print b
# Prints the individual bytes, showing you that it's just a list of ints
print [i for i in b]
# Lets add one to the D
b[0] += 1
# And print the string again to see the result!
print b
The result:
DFH
[68, 70, 72]
EFH
This is the type you want if you want raw byte manipulation. If what you want is to read 4 bytes as a 32bit int, one would use the struct module, with the unpack method, but I usually just shift them together myself from a bytearray.
Printing the header in binary
What you seem to want is to take the string you have, convert it to a bytearray, and print them as a string in base 2/binary.
So here is a short example for how to write the header out (I read random data from a file named "dump"):
with open('dump', 'rb') as f:
header = f.read(6)
b = bytearray(header)
print ' '.join([bin(i)[2:].zfill(8) for i in b])
After converting it to a bytearray, I call bin() on every single one, which gives back a string with the binary representation we need, in the format of "0b1010". I don't want the "0b", so I slice it off with [2:]. Then, I use the string method zfill, which allows me to have the required amount of 0's prepended for the string to be 8 long (which is the amount of bits we need), as bin will not show any unneeded zeroes.
If you're new to the language, the last line might look quite mean. It uses list comprehension to make a list of all the binary strings we want to print, and then join them into the final string with spaces between the elements.
A less pythonic/convoluted variant of the last line would be:
result = []
for byte in b:
string = bin(i)[2:] # Make a binary string and slice the first two bytes
result.append(string.zfill(8)) # Append a 0-padded version to the results list
# Join the array to a space separated string and print it!
print ' '.join(result)
I hope this helps!

How to convert hexadecimal string to bytes in Python?

I have a long Hex string that represents a series of values of different types. I need to convert this Hex String into bytes or bytearray so that I can extract each value from the raw data. How can I do this?
For example, the string "ab" should convert to the bytes b"\xab" or equivalent byte array. Longer example:
>>> # what to use in place of `convert` here?
>>> convert("8e71c61de6a2321336184f813379ec6bf4a3fb79e63cd12b")
b'\x8eq\xc6\x1d\xe6\xa22\x136\x18O\x813y\xeck\xf4\xa3\xfby\xe6<\xd1+'
Suppose your hex string is something like
>>> hex_string = "deadbeef"
Convert it to a bytearray (Python 3 and 2.7):
>>> bytearray.fromhex(hex_string)
bytearray(b'\xde\xad\xbe\xef')
Convert it to a bytes object (Python 3):
>>> bytes.fromhex(hex_string)
b'\xde\xad\xbe\xef'
Note that bytes is an immutable version of bytearray.
Convert it to a string (Python ≤ 2.7):
>>> hex_data = hex_string.decode("hex")
>>> hex_data
"\xde\xad\xbe\xef"
There is a built-in function in bytearray that does what you intend.
bytearray.fromhex("de ad be ef 00")
It returns a bytearray and it reads hex strings with or without space separator.
provided I understood correctly, you should look for binascii.unhexlify
import binascii
a='45222e'
s=binascii.unhexlify(a)
b=[ord(x) for x in s]
Assuming you have a byte string like so
"\x12\x45\x00\xAB"
and you know the amount of bytes and their type you can also use this approach
import struct
bytes = '\x12\x45\x00\xAB'
val = struct.unpack('<BBH', bytes)
#val = (18, 69, 43776)
As I specified little endian (using the '<' char) at the start of the format string the function returned the decimal equivalent.
0x12 = 18
0x45 = 69
0xAB00 = 43776
B is equal to one byte (8 bit) unsigned
H is equal to two bytes (16 bit) unsigned
More available characters and byte sizes can be found here
The advantages are..
You can specify more than one byte and the endian of the values
Disadvantages..
You really need to know the type and length of data your dealing with
You can use the Codecs module in the Python Standard Library, i.e.
import codecs
codecs.decode(hexstring, 'hex_codec')
You should be able to build a string holding the binary data using something like:
data = "fef0babe"
bits = ""
for x in xrange(0, len(data), 2)
bits += chr(int(data[x:x+2], 16))
This is probably not the fastest way (many string appends), but quite simple using only core Python.
A good one liner is:
byte_list = map(ord, hex_string)
This will iterate over each char in the string and run it through the ord() function. Only tested on python 2.6, not too sure about 3.0+.
-Josh

Categories

Resources