I am trying to write a pit array to a file in python as in this example: python bitarray to and from file
however, I get garbage in my actual test file:
test_1 = ^#^#
test_2 = ^#^#
code:
from bitarray import bitarray
def test_function(myBitArray):
test_bitarray=bitarray(10)
test_bitarray.setall(0)
with open('test_file.inp','w') as output_file:
output_file.write('test_1 = ')
myBitArray.tofile(output_file)
output_file.write('\ntest_2 = ')
test_bitarray.tofile(output_file)
Any help with what's going wrong would be appreciated.
That's not garbage. The tofile function writes binary data to a binary file. A 10-bit-long bitarray with all 0's will be output as two bytes of 0. (The docs explain that when the length is not a multiple of 8, it's padded with 0 bits.) When you read that as text, two 0 bytes will look like ^#^#, because ^# is the way (many) programs represent a 0 byte as text.
If you want a human-readable text-friendly representation, use the to01 method, which returns a human-readable strings. For example:
with open('test_file.inp','w') as output_file:
output_file.write('test_1 = ')
output_file.write(myBitArray.to01())
output_file.write('\ntest_2 = ')
output_file(test_bitarray.to01())
Or maybe you want this instead:
output_file(str(test_bitarray))
… which will give you something like:
bitarray('0000000000')
Related
Totally new to python. Trying to parse a file but not all records contain data. I want to skip the records that are all hex 00.
if record == ('\x00' * 256): from a sample of print("-"*80))
gave a Syntax error, hey I said I was new. :)
Thanks for the reply, I'm using 2.7 and reading like this....
with open(testfile, "rb") as f:
counter = 0
while True:
record = f.read(256)
counter += 1
Your example looks to be very close. I'm not sure about Python 2, but in Python 3 you should specify that a string is binary.
I would do something like:
empty = b'\x00' * 256
if record == empty:
print('skipped this line')
Remember that Python 2 uses print statements, so you should do print 'skipped this line' instead.
I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.
For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.
Clarification:
My string looks something like this:
binary_string = "001011010110000010010"
It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.
To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module
import struct
#...
with open('file.dat', 'wb') as f:
if isinstance(value, int):
f.write(struct.pack('i', value)) # write an int
elif isinstance(value, str):
f.write(value) # write a string
else:
raise TypeError('Can only write str or int')
However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s
>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'
but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:
>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'
Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.
The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.
from array import *
bin_array = array('B')
bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))
with file('binary.mydata', 'wb') as f:
bin_array.tofile(f)
I want that pattern of ones and zeros to be written to a file.
If you mean you want to write a bitstream from a string to a file, you'll need something like this...
from cStringIO import StringIO
s = "001011010110000010010"
sio = StringIO(s)
f = open('outfile', 'wb')
while 1:
# Grab the next 8 bits
b = sio.read(8)
# Bail if we hit EOF
if not b:
break
# If we got fewer than 8 bits, pad with zeroes on the right
if len(b) < 8:
b = b + '0' * (8 - len(b))
# Convert to int
i = int(b, 2)
# Convert to char
c = chr(i)
# Write
f.write(c)
f.close()
...for which xxd -b outfile shows...
0000000: 00101101 01100000 10010000 -`.
Brief example:
my_number = 1234
with open('myfile', 'wb') as file_handle:
file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
my_number_back = struct.unpack('i', file_handle.read())[0]
Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.
It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.
Python 3.4 example
original_string = '001011010110000010010'
# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]
# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]
with open('byte.dat', 'wb') as f:
f.write(bytearray(byte_list)) # convert to bytearray before writing
Contents of byte.dat:
hex: 2D 60 12
binary (by 8 bits): 00101101 01100000 00010010
binary (by 3 bits): 001 011 010 110 000 000 010 010
^^ ^ (Note extra bits)
Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.
When reading a file (UTF-8 Unicode text, csv) with Python on Linux, either with:
csv.reader()
file()
values of some columns get a zero as their first characeter (there are no zeroues in input), other get a few zeroes, which are not seen when viewing file with Geany or any other editor. For example:
Input
10016;9167DE1;Tom;Sawyer ;Street 22;2610;Wil;;378983561;tom#hotmail.com;1979-08-10 00:00:00.000;0;1;Wil;081208608;NULL;2;IZMH726;2010-08-30 15:02:55.777;2013-06-24 08:17:22.763;0;1;1;1;NULL
Output
10016;9167DE1;Tom;Sawyer ;Street 22;2610;Wil;;0378983561;tom#hotmail.com;1979-08-10 00:00:00.000;0;1;Wil;081208608;NULL;2;IZMH726;2010-08-30 15:02:55.777;2013-06-24 08:17:22.763;0;1;1;1;NULL
See 378983561 > 0378983561
Reading with:
f = file('/home/foo/data.csv', 'r')
data = f.read()
split_data = data.splitlines()
lines = list(line.split(';') for line in split_data)
print data[51220][8]
>>> '0378983561' #should have been '478983561' (reads like this in Geany etc.)
Same result with csv.reader().
Help me solve the mystery, what could be the cause of this? Could it be related to encoding/decoding?
The data you're getting is a string.
print data[51220][8]
>>> '0478983561'
If you want to use this as an integer, you should parse it.
print int(data[51220][8])
>>> 478983561
If you want this as a string, you should convert it back to a string.
print repr(int(data[51220][8]))
>>> '478983561'
csv.reader treats all columns as strings. Conversion to the appropriate type is up to you as in:
print int(data[51220][8])
I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.
For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.
Clarification:
My string looks something like this:
binary_string = "001011010110000010010"
It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.
To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module
import struct
#...
with open('file.dat', 'wb') as f:
if isinstance(value, int):
f.write(struct.pack('i', value)) # write an int
elif isinstance(value, str):
f.write(value) # write a string
else:
raise TypeError('Can only write str or int')
However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s
>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'
but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:
>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'
Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.
The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.
from array import *
bin_array = array('B')
bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))
with file('binary.mydata', 'wb') as f:
bin_array.tofile(f)
I want that pattern of ones and zeros to be written to a file.
If you mean you want to write a bitstream from a string to a file, you'll need something like this...
from cStringIO import StringIO
s = "001011010110000010010"
sio = StringIO(s)
f = open('outfile', 'wb')
while 1:
# Grab the next 8 bits
b = sio.read(8)
# Bail if we hit EOF
if not b:
break
# If we got fewer than 8 bits, pad with zeroes on the right
if len(b) < 8:
b = b + '0' * (8 - len(b))
# Convert to int
i = int(b, 2)
# Convert to char
c = chr(i)
# Write
f.write(c)
f.close()
...for which xxd -b outfile shows...
0000000: 00101101 01100000 10010000 -`.
Brief example:
my_number = 1234
with open('myfile', 'wb') as file_handle:
file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
my_number_back = struct.unpack('i', file_handle.read())[0]
Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.
It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.
Python 3.4 example
original_string = '001011010110000010010'
# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]
# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]
with open('byte.dat', 'wb') as f:
f.write(bytearray(byte_list)) # convert to bytearray before writing
Contents of byte.dat:
hex: 2D 60 12
binary (by 8 bits): 00101101 01100000 00010010
binary (by 3 bits): 001 011 010 110 000 000 010 010
^^ ^ (Note extra bits)
Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.
I want to convert a binary file (such as a jpg, mp3, etc) to web-safe text and then back into binary data. I've researched a few modules and I think I'm really close but I keep getting data corruption.
After looking at the documentation for binascii I came up with this:
from binascii import *
raw_bytes = open('test.jpg','rb').read()
text = b2a_qp(raw_bytes,quotetabs=True,header=False)
bytesback = a2b_qp(text,header=False)
f = open('converted.jpg','wb')
f.write(bytesback)
f.close()
When I try to open the converted.jpg I get data corruption :-/
I also tried using b2a_base64 with 57-long blocks of binary data. I took each block, converted to a string, concatenated them all together, and then converted back in a2b_base64 and got corruption again.
Can anyone help? I'm not super knowledgeable on all the intricacies of bytes and file formats. I'm using Python on Windows if that makes a difference with the \r\n stuff
Your code looks quite complicated. Try this:
#!/usr/bin/env python
from binascii import *
raw_bytes = open('28.jpg','rb').read()
i = 0
str_one = b2a_base64(raw_bytes) # 1
str_list = b2a_base64(raw_bytes).split("\n") #2
bytesBackAll = a2b_base64(''.join(str_list)) #2
print bytesBackAll == raw_bytes #True #2
bytesBackAll = a2b_base64(str_one) #1
print bytesBackAll == raw_bytes #True #1
Lines tagged with #1 and #2 represent alternatives to each other. #1 seems most straightforward to me - just make it one string, process it and convert it back.
You should use base64 encoding instead of quoted printable. Use b2a_base64() and a2b_base64().
Quoted printable is much bigger for binary data like pictures. In this encoding each binary (non alphanumeric character) code is changed into =HEX. It can be used for texts that consist mainly of alphanumeric like email subjects.
Base64 is much better for mainly binary data. It takes 6 bites of first byte, then last 2 bits of 1st byte and 4 bites from 2nd byte. etc. It can be recognized by = padding at the end of the encoded text (sometimes other character is used).
As an example I took .jpeg of 271 700 bytes. In qp it is 627 857 b while in base64 it is 362 269 bytes. Size of qp is dependent of data type: text which is letters only do not change. Size of base64 is orig_size * 8 / 6.
Your documentation reference is for Python 3.0.1. There is no good reason using Python 3.0. You should be using 3.2 or 2.7. What exactly are you using?
Suggestion: (1) change bytes to raw_bytes to avoid confusion with the bytes built-in (2) check for raw_bytes == bytes_back in your test script (3) while your test should work with quoted-printable, it is very inefficient for binary data; use base64 instead.
Update: Base64 encoding produces 4 output bytes for every 3 input bytes. Your base64 code doesn't work with 56-byte chunks because 56 is not an integral multiple of 3; each chunk is padded out to a multiple of 3. Then you join the chunks and attempt to decode, which is guaranteed not to work.
Your chunking loop would be much better written as:
output_string = ''.join(
b2a_base64(raw_bytes[i:i+57]) for i in xrange(0, xrange(len(raw_bytes), 57)
)
In any case, chunking is rather slow and pointless; just do b2a_base64(raw_bytes)
#PMC's answer copied from the question:
Here's what works:
from binascii import *
raw_bytes = open('28.jpg','rb').read()
str_list = []
i = 0
while i < len(raw_bytes):
byteSegment = raw_bytes[i:i+57]
str_list.append(b2a_base64(byteSegment))
i += 57
bytesBackAll = a2b_base64(''.join(str_list))
print bytesBackAll == raw_bytes #True
Thanks for the help guys. I'm not sure why this would fail with [0:56] instead of [0:57] but I'll leave that as an exercise for the reader :P