How to write/pack a binary string to file in Python - python

I would like to do a simple operations, but I'm not able to manage it. I have a string of '0' and '1' derived by a coding algorithm. I would like to write it to file but I think that I'm doing wrongly.
My string is something like '11101010000......10000101010'
Actually I'm writing a binary file as:
print 'WRITE TO FILE '
with open('file.bin', 'wb') as f:
f.write(my_coded_string)
print 'READ FROM FILE'
with open('file.bin', 'rb') as f:
myArr = bytearray(f.read())
myArr = str(myArr)
If I look at the size of the file, I get something pretty big. So I guess that I'm using an entire byte to write each 1 and 0. Is that correct?
I have found some example which use the 'struct' function but I didn't manage to understand how it works.
Thanks!

Because input binary is string python writes each bit as a char.
You can write your bit streams with bitarray module from
like this:
from bitarray import bitarray
str = '110010111010001110'
a = bitarray(str)
with open('file.bin', 'wb') as f:
a.tofile(f)
b = bitarray()
with open('file.bin', 'rb') as f:
b.fromfile(f)
print b

Use this:
import re
text = "01010101010000111100100101"
bytes = [ chr(int(sbyte,2)) for sbyte in re.findall('.{8}?', text) ]
to obtain a list of bytes, that can be append to binary file, with
with open('output.bin','wb') as f:
f.write("".join(bytes))

Related

Easy way to switch endianess of string

I need to read the binary file, and write it's content in form of text file which will initialize memory model. Problem is, I need to switch endianess in process. Let's look at example
binary file content, when I read it with:
with open(source_name, mode='rb') as file:
fileContent = file.read().hex()
filecontent: "aa000000bb000000...".
I need, to transform that into "000000aa000000bb...".
Of course, I can split this string into list of 8 chars substrings, than manualy reorganize it like newsubstr = substr[6:8]+substr[4:6]+substr[2:4]+substr[0:2]
, and then merge them into result string, but that seems clumsily, I suppose there is more natural way to do this in python.
Thanks to k1m190r, I found out about struct module which looks like what I need, but I still lost. I just designed another clumsy solution:
with open(source_name, mode='rb') as file:
fileContent = file.read()
while len(fileContent)%4 != 0:
fileContent += b"\x00"
res = ""
for i in range(0,len(fileContent),4):
substr = fileContent[i:i+4]
substr_val = struct.unpack("<L", substr)[0]
res += struct.pack(">L", substr_val).hex()
Is there a more elegant way? This solution is just slightly better than the original.
Actually in your specific case you don't even need struct. Below should be sufficient.
from binascii import b2a_hex
# open files in binary
with open("infile", "rb") as infile, open("outfile", "wb") as outfile:
# read 4 bytes at a time till read() spits out empty byte string b""
for x in iter(lambda: infile.read(4), b""):
if len(x) != 4:
# skip last bit if it is not 4 bytes long
break
outfile.write(b2a_hex(x[::-1]))
Is there a more elegant way? This solution is just slightly better than the original
Alternatively, you can craft a "smarter" struct format string: format specifiers take a number prefix which is the number of repetitions e.g. 10L is the same as LLLLLLLLLL so you can inject the size of your data divided by 4 before the letter and and convert the entire thing in one go (or a few steps, I don't know how big the counter can be).
array.array might also work as that's what the `byteswap, but you can't specify the input endianness (I think), so it's iffier.
To answer the original question:
import re
changed = re.sub(b'(....)', lambda x:x.group()[::-1], bindata)
Note: original had r'(....)' when the r should have been b.

Adding line when converting string to text file

Python
I have 1000+ files with numerically consecutive names like IMAG0000.JPG that I have saved as list, converted to string, and saved as a text file. I want the text file to look like this:
IMAG0000.JPG
IMAG0001.JPG
IMAG0002.JPG
IMAG0003.JPG
...
Currently, it looks like:
IMAG0000.JPGIMAG0001.JPGIMAG0002.JPGIMAG0003.JPG...
I can't quite figure out where to put \n to make it format correctly. This is what I have so far...
import glob
newfiles=[]
filenames=glob.glob('*.JPG')
newfiles =''.join(filenames)
f=open('file.txt','w')
f.write(newfiles)
You concat with an empty string '' instead of '\n'.
newfiles = '\n'.join(filenames)
f = open('file.txt','w')
f.write(newfiles) # keep in mind to use f.close()
or safer (i.e. releasing the file handle):
with open("file.txt", w) as f:
f.write('\n'.join(filenames))
or instead of concatting everything:
with open("file.txt", w) as f:
for filename in filenames:
f.write(filename + '\n')
Try this:
newfiles = '\n'.join(filenames)
Side note: it's good practice to use the with keyword when dealing with file objects, so the code:
f=open('file.txt','w')
f.write(newfiles)
would become:
with open('file.txt','w') as f:
f.write(newfiles)
That way you do not need to explicitly do f.close() to close the file.

Create text file of hexadecimal from binary

I would like to convert a binary to hexadecimal in a certain format and save it as a text file.
The end product should be something like this:
"\x7f\xe8\x89\x00\x00\x00\x60\x89\xe5\x31\xd2\x64\x8b\x52"
Input is from an executable file "a".
This is my current code:
with open('a', 'rb') as f:
byte = f.read(1)
hexbyte = '\\x%02s' % byte
print hexbyte
A few issues with this:
This only prints the first byte.
The result is "\x" and a box like this:
00
7f
In terminal it looks exactly like this:
Why is this so? And finally, how do I save all the hexadecimals to a text file to get the end product shown above?
EDIT: Able to save the file as text with
txt = open('out.txt', 'w')
print >> txt, hexbyte
txt.close()
You can't inject numbers into escape sequences like that. Escape sequences are essentially constants, so, they can't have dynamic parts.
There's already a module for this, anyway:
from binascii import hexlify
with open('test', 'rb') as f:
print(hexlify(f.read()).decode('utf-8'))
Just use the hexlify function on a byte string and it'll give you a hex byte string. You need the decode to convert it back into an ordinary string.
Not quite sure if decode works in Python 2, but you really should be using Python 3, anyway.
Your output looks like a representation of a bytestring in Python returned by repr():
with open('input_file', 'rb') as file:
print repr(file.read())
Note: some bytes are shown as ascii characters e.g. '\x52' == 'R'. If you want all bytes to be shown as the hex escapes:
with open('input_file', 'rb') as file:
print "\\x" + "\\x".join([c.encode('hex') for c in file.read()])
Just add the content to list and print:
with open("default.png",'rb') as file_png:
a = file_png.read()
l = []
l.append(a)
print l

Python3 ASCII Hexadecimal to Binary String Conversion

I'm using Python 3.2.3 on Windows, and am trying to convert binary data within a C-style ASCII file into its binary equivalent for later parsing using the struct module. For example, my input file contains "0x000A 0x000B 0x000C 0x000D", and I'd like to convert it into "\x00\x0a\x00\x0b\x00\x0c\x00\x0d".
The problem I'm running into is that the string datatypes have changed in Python 3, and the built-in functions to convert from hexadecimal to binary, such as binascii.unhexlify(), no longer accept regular unicode strings, but only byte strings. This process of converting from unicode strings to byte strings and back is confusing me, so I'm wondering if there's an easier way to achieve this. Below is what I have so far:
with open(path, "r") as f:
l = []
data = f.read()
values = data.split(" ")
for v in values:
if (v.startswith("0x")):
l.append(binascii.unhexlify(bytes(v[2:], "utf-8").decode("utf-8")
string = ''.join(l)
3>> ''.join(chr(int(x, 16)) for x in "0x000A 0x000B 0x000C 0x000D".split()).encode('utf-16be')
b'\x00\n\x00\x0b\x00\x0c\x00\r'
As agf says, opening the image with mode 'r' will give you string data.
Since the only thing you are doing here is looking at binary data, you probably want to open with 'rb' mode and make your result of type bytes, not str.
Something like:
with open(path, "rb") as f:
l = []
data = f.read()
values = data.split(b" ")
for v in values:
if (v.startswith(b"0x")):
l.append(binascii.unhexlify(v[2:]))
result = b''.join(l)

Reading binary file (.chn) in Python

In python, how do I read a binary file (here I need to read a .chn file) and show the result in binary format?
Assuming that values are separated by a space:
with open('myfile.chn', 'rb') as f:
data = []
for line in f: # a file supports direct iteration
data.extend(hex(int(x, 2)) for x in line.split())
In Python is better to use open() over file(), documentation says it explicitly:
When opening a file, it’s preferable to use open() instead of invoking
the file constructor directly.
rb mode will open the file in binary mode.
Reference:
http://docs.python.org/library/functions.html#open
try this:
with open('myfile.chn') as f:
data=f.read()
data=[bin(ord(x)).strip('0b') for x in data]
print ''.join(data)
and if you want only the binary data it will be in the list.
with open('myfile.chn') as f:
data=f.read()
data=[bin(ord(x)).strip('0b') for x in data]
print data
In data now you will have the list of binary numbers. you can take this and convert to hexadecimal number
with file('myfile.chn') as f:
data = f.read() # read all strings at once and return as a list of strings
data = [hex(int(x, 2)) for x in data] # convert to a list of hex strings (by interim getting the decimal value)

Categories

Resources