Write a binary integer or string to a file in python - python

I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.
For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.
Clarification:
My string looks something like this:
binary_string = "001011010110000010010"
It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.

To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module
import struct
#...
with open('file.dat', 'wb') as f:
if isinstance(value, int):
f.write(struct.pack('i', value)) # write an int
elif isinstance(value, str):
f.write(value) # write a string
else:
raise TypeError('Can only write str or int')
However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s
>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'
but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:
>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'

Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.
The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.
from array import *
bin_array = array('B')
bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))
with file('binary.mydata', 'wb') as f:
bin_array.tofile(f)

I want that pattern of ones and zeros to be written to a file.
If you mean you want to write a bitstream from a string to a file, you'll need something like this...
from cStringIO import StringIO
s = "001011010110000010010"
sio = StringIO(s)
f = open('outfile', 'wb')
while 1:
# Grab the next 8 bits
b = sio.read(8)
# Bail if we hit EOF
if not b:
break
# If we got fewer than 8 bits, pad with zeroes on the right
if len(b) < 8:
b = b + '0' * (8 - len(b))
# Convert to int
i = int(b, 2)
# Convert to char
c = chr(i)
# Write
f.write(c)
f.close()
...for which xxd -b outfile shows...
0000000: 00101101 01100000 10010000 -`.

Brief example:
my_number = 1234
with open('myfile', 'wb') as file_handle:
file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
my_number_back = struct.unpack('i', file_handle.read())[0]

Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.
It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.
Python 3.4 example
original_string = '001011010110000010010'
# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]
# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]
with open('byte.dat', 'wb') as f:
f.write(bytearray(byte_list)) # convert to bytearray before writing
Contents of byte.dat:
hex: 2D 60 12
binary (by 8 bits): 00101101 01100000 00010010
binary (by 3 bits): 001 011 010 110 000 000 010 010
^^ ^ (Note extra bits)
Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.

Related

Read binary file and check with matching character in python

I would like to scan through data files from GPS receiver byte-wise (actually it will be a continuous flow, not want to test the code with offline data). If find a match, then check the next 2 bytes for the 'length' and get the next 2 bytes and shift 2 bits(not byte) to the right, etc. I didn't handle binary before, so stuck in a simple task. I could read the binary file byte-by-byte, but can not find a way to match by desired pattern (i.e. D3).
with open("COM6_200417.ubx", "rb") as f:
byte = f.read(1) # read 1-byte at a time
while byte != b"":
# Do stuff with byte.
byte = f.read(1)
print(byte)
The output file is:
b'\x82'
b'\xc2'
b'\xe3'
b'\xb8'
b'\xe0'
b'\x00'
b'#'
b'\x13'
b'\x05'
b'!'
b'\xd3'
b'\x00'
b'\x13'
....
how to check if that byte is == '\xd3'? (D3)
also would like to know how to shift bit-wise, as I need to check decimal value consisting of 6 bits
(1-byte and next byte's first 2-bits). Considering, taking 2-bytes(8-bits) and then 2-bit right-shift
to get 6-bits. Is it possible in python? Any improvement/addition/changes are very much appreciated.
ps. can I get rid of that pesky 'b' from the front? but if ignoring it does not affect then no problem though.
Thanks in advance.
'That byte' is represented with a b'' in front, indicating that it is a byte object. To get rid of it, you can convert it to an int:
thatbyte = b'\xd3'
byteint = thatbyte[0] # or
int.from_bytes(thatbyte, 'big') # 'big' or 'little' endian, which results in the same when converting a single byte
To compare, you can do:
thatbyte == b'\xd3'
Thus compare a byte object with another byte object.
The shift << operator works on int only
To convert an int back to bytes (assuming it is [0..255]) you can use:
bytes([byteint]) # note the extra brackets!
And as for improvements, I would suggest to read the whole binary file at once:
with open("COM6_200417.ubx", "rb") as f:
allbytes = f.read() # read all
for val in allbytes:
# Do stuff with val, val is int !!!
print(bytes([val]))

Write bitstream to file python 3 [duplicate]

I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.
For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.
Clarification:
My string looks something like this:
binary_string = "001011010110000010010"
It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.
To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module
import struct
#...
with open('file.dat', 'wb') as f:
if isinstance(value, int):
f.write(struct.pack('i', value)) # write an int
elif isinstance(value, str):
f.write(value) # write a string
else:
raise TypeError('Can only write str or int')
However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s
>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'
but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:
>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'
Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.
The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.
from array import *
bin_array = array('B')
bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))
with file('binary.mydata', 'wb') as f:
bin_array.tofile(f)
I want that pattern of ones and zeros to be written to a file.
If you mean you want to write a bitstream from a string to a file, you'll need something like this...
from cStringIO import StringIO
s = "001011010110000010010"
sio = StringIO(s)
f = open('outfile', 'wb')
while 1:
# Grab the next 8 bits
b = sio.read(8)
# Bail if we hit EOF
if not b:
break
# If we got fewer than 8 bits, pad with zeroes on the right
if len(b) < 8:
b = b + '0' * (8 - len(b))
# Convert to int
i = int(b, 2)
# Convert to char
c = chr(i)
# Write
f.write(c)
f.close()
...for which xxd -b outfile shows...
0000000: 00101101 01100000 10010000 -`.
Brief example:
my_number = 1234
with open('myfile', 'wb') as file_handle:
file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
my_number_back = struct.unpack('i', file_handle.read())[0]
Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.
It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.
Python 3.4 example
original_string = '001011010110000010010'
# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]
# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]
with open('byte.dat', 'wb') as f:
f.write(bytearray(byte_list)) # convert to bytearray before writing
Contents of byte.dat:
hex: 2D 60 12
binary (by 8 bits): 00101101 01100000 00010010
binary (by 3 bits): 001 011 010 110 000 000 010 010
^^ ^ (Note extra bits)
Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.

Convert binary data to web-safe text and back - Python

I want to convert a binary file (such as a jpg, mp3, etc) to web-safe text and then back into binary data. I've researched a few modules and I think I'm really close but I keep getting data corruption.
After looking at the documentation for binascii I came up with this:
from binascii import *
raw_bytes = open('test.jpg','rb').read()
text = b2a_qp(raw_bytes,quotetabs=True,header=False)
bytesback = a2b_qp(text,header=False)
f = open('converted.jpg','wb')
f.write(bytesback)
f.close()
When I try to open the converted.jpg I get data corruption :-/
I also tried using b2a_base64 with 57-long blocks of binary data. I took each block, converted to a string, concatenated them all together, and then converted back in a2b_base64 and got corruption again.
Can anyone help? I'm not super knowledgeable on all the intricacies of bytes and file formats. I'm using Python on Windows if that makes a difference with the \r\n stuff
Your code looks quite complicated. Try this:
#!/usr/bin/env python
from binascii import *
raw_bytes = open('28.jpg','rb').read()
i = 0
str_one = b2a_base64(raw_bytes) # 1
str_list = b2a_base64(raw_bytes).split("\n") #2
bytesBackAll = a2b_base64(''.join(str_list)) #2
print bytesBackAll == raw_bytes #True #2
bytesBackAll = a2b_base64(str_one) #1
print bytesBackAll == raw_bytes #True #1
Lines tagged with #1 and #2 represent alternatives to each other. #1 seems most straightforward to me - just make it one string, process it and convert it back.
You should use base64 encoding instead of quoted printable. Use b2a_base64() and a2b_base64().
Quoted printable is much bigger for binary data like pictures. In this encoding each binary (non alphanumeric character) code is changed into =HEX. It can be used for texts that consist mainly of alphanumeric like email subjects.
Base64 is much better for mainly binary data. It takes 6 bites of first byte, then last 2 bits of 1st byte and 4 bites from 2nd byte. etc. It can be recognized by = padding at the end of the encoded text (sometimes other character is used).
As an example I took .jpeg of 271 700 bytes. In qp it is 627 857 b while in base64 it is 362 269 bytes. Size of qp is dependent of data type: text which is letters only do not change. Size of base64 is orig_size * 8 / 6.
Your documentation reference is for Python 3.0.1. There is no good reason using Python 3.0. You should be using 3.2 or 2.7. What exactly are you using?
Suggestion: (1) change bytes to raw_bytes to avoid confusion with the bytes built-in (2) check for raw_bytes == bytes_back in your test script (3) while your test should work with quoted-printable, it is very inefficient for binary data; use base64 instead.
Update: Base64 encoding produces 4 output bytes for every 3 input bytes. Your base64 code doesn't work with 56-byte chunks because 56 is not an integral multiple of 3; each chunk is padded out to a multiple of 3. Then you join the chunks and attempt to decode, which is guaranteed not to work.
Your chunking loop would be much better written as:
output_string = ''.join(
b2a_base64(raw_bytes[i:i+57]) for i in xrange(0, xrange(len(raw_bytes), 57)
)
In any case, chunking is rather slow and pointless; just do b2a_base64(raw_bytes)
#PMC's answer copied from the question:
Here's what works:
from binascii import *
raw_bytes = open('28.jpg','rb').read()
str_list = []
i = 0
while i < len(raw_bytes):
byteSegment = raw_bytes[i:i+57]
str_list.append(b2a_base64(byteSegment))
i += 57
bytesBackAll = a2b_base64(''.join(str_list))
print bytesBackAll == raw_bytes #True
Thanks for the help guys. I'm not sure why this would fail with [0:56] instead of [0:57] but I'll leave that as an exercise for the reader :P

How to Read-in Binary of a File in Python

In Python, when I try to read in an executable file with 'rb', instead of getting the binary values I expected (0010001 etc.), I'm getting a series of letters and symbols that I do not know what to do with.
Ex: ???}????l?S??????V?d?\?hG???8?O=(A).e??????B??$????????: ???Z?C'???|lP#.\P?!??9KRI??{F?AB???5!qtWI??8𜐮???!ᢉ?]?zъeF?̀z??/?n??
How would I access the binary numbers of a file in Python?
Any suggestions or help would be appreciated. Thank you in advance.
That is the binary. They are stored as bytes, and when you print them, they are interpreted as ASCII characters.
You can use the bin() function and the ord() function to see the actual binary codes.
for value in enumerate(data):
print bin(ord(value))
Byte sequences in Python are represented using strings. The series of letters and symbols that you see when you print out a byte sequence is merely a printable representation of bytes that the string contains. To make use of this data, you usually manipulate it in some way to obtain a more useful representation.
You can use ord(x) or bin(x) to obtain decimal and binary representations, respectively:
>>> f = open('/tmp/IMG_5982.JPG', 'rb')
>>> data = f.read(10)
>>> data
'\x00\x00II*\x00\x08\x00\x00\x00'
>>> data[2]
'I'
>>> ord(data[2])
73
>>> hex(ord(data[2]))
'0x49'
>>> bin(ord(data[2]))
'0b1001001'
>>> f.close()
The 'b' flag that you pass to open() does not tell Python anything about how to represent the file contents. From the docs:
Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
Unless you just want to look at what the binary data from the file looks like, Mark Pilgrim's book, Dive Into Python, has an example of working with binary file formats. The example shows how you can read IDv1 tags from an MP3 file. The book's website seems to be down, so I'm linking to a mirror.
Each character in the string is the ASCII representation of a binary byte. If you want it as a string of zeros and ones then you can convert each byte to an integer, format it as 8 binary digits and join everything together:
>>> s = "hello world"
>>> ''.join("{0:08b}".format(ord(x)) for x in s)
'0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
Depending on if you really need to analyse / manipulate things at the binary level an external module such as bitstring could be helpful. Check out the docs; to just get the binary interpretation use something like:
>>> f = open('somefile', 'rb')
>>> b = bitstring.Bits(f)
>>> b.bin
0100100101001001...
Use ord(x) to get the integer value of each byte.
>>> with open('settings.dat', 'rb') as file:
... data = file.read()
...
>>> for index, value in enumerate(data):
... print '0x%08x 0x%02x' % (index, ord(value))
...
0x00000000 0x28
0x00000001 0x64
0x00000002 0x70
0x00000003 0x30
0x00000004 0x0d
0x00000005 0x0a
0x00000006 0x53
0x00000007 0x27
0x00000008 0x4d
0x00000009 0x41
0x0000000a 0x49
0x0000000b 0x4e
0x0000000c 0x5f
0x0000000d 0x57
0x0000000e 0x49
0x0000000f 0x4e
If you realy want to convert the binaray bytes to a stream of bits, you have to remove the first two chars ('0b') from the output of bin() and reverse the result:
with open("settings.dat", "rb") as fp:
print "".join( (bin(ord(c))[2:][::-1]).ljust(8,"0") for c in fp.read() )
If you use Python prior to 2.6, you have no bin() function.

How to write individual bits to a text file in python?

Suppose I have a number like 824 and I write it to a text file using python. In the text file, it will take 3 bytes space. However, If i represent it using bits, it has the following representation 0000001100111000 which is 2 bytes (16 bits). I was wondering how can I write bits to file in python, not bytes. If I can do that, the size of the file will be 2 bytes, not 3.
Please provide code. I am using python 2.6. Also, I do not want to use any external modules that do not come with the basic installation
I tried below and gave me 12 bytes!
a =824;
c=bin(a)
handle = open('try1.txt','wb')
handle.write(c)
handle.close()
The struct module is what you want. From your example, 824 = 0000001100111000 binary or 0338 hexadecimal. This is the two bytes 03H and 38H. struct.pack will convert 824 to a string of these two bytes, but you also have to decide little-endian (write the 38H first) or big-endian (write the 03H first).
Example
>>> import struct
>>> struct.pack('>H',824) # big-endian
'\x038'
>>> struct.pack('<H',824) # little-endian
'8\x03'
>>> struct.pack('H',824) # Use system default
'8\x03'
struct returns a two-byte string. the '\x##' notation means (a byte with hexadecimal value ##). the '8' is an ASCII '8' (value 38H). Python byte strings use ASCII for printable characters, and \x## notation for unprintable characters.
Below is an example writing and reading binary data to a file. You should always specify the endian-ness when writing to and reading from a binary file, in case it is read on a system with a different endian default:
import struct
a = 824
bin_data = struct.pack('<H',824)
print 'bin_data length:',len(bin_data)
with open('data.bin','wb') as f:
f.write(bin_data)
with open('data.bin','rb') as f:
bin_data = f.read()
print 'Value from file:',struct.unpack('<H',bin_data)[0]
print 'bin_data representation:',repr(bin_data)
for i,c in enumerate(bin_data):
print 'Byte {0} as binary: {1:08b}'.format(i,ord(c))
Output
bin_data length: 2
Value from file: 824
bin_data representation: '8\x03'
Byte 0 as binary: 00111000
Byte 1 as binary: 00000011
Have a look at struct:
>>> struct.pack("h", 824)
'8\x03'
I think what you want is to open the file in binary mode:
open("file.bla", "wb")
However, this will write an integer to the file, which will probably be 4 bytes in size. I do not know if Python has a 2 byte integer type. But you can circumvent that by encoding 2 16 bit number in one 32 bit number:
a = 824
b = 1234
c = (a << 16) + b

Categories

Resources