I am running the following code on ubuntu 11.10, python 2.7.2+.
import urllib
import Image
import StringIO
source = '/home/cah/Downloads/evil2.gfx'
dataFile = open(source, 'rb').read()
slicedFile1 = StringIO.StringIO(dataFile[::5])
slicedFile2 = StringIO.StringIO(dataFile[1::5])
slicedFile3 = StringIO.StringIO(dataFile[2::5])
slicedFile4 = StringIO.StringIO(dataFile[3::5])
jpgimage1 = Image.open(slicedFile1)
jpgimage1.save('/home/cah/Documents/pychallenge12.1.jpg')
pngimage1 = Image.open(slicedFile2)
pngimage1.save('/home/cah/Documents/pychallenge12.2.png')
gifimage1 = Image.open(slicedFile3)
gifimage1.save('/home/cah/Documents/pychallenge12.3.gif')
pngimage2 = Image.open(slicedFile4)
pngimage2.save('/home/cah/Documents/pychallenge12.4.png')
in essence i'm taking a .bin file that has hex code for several image files jumbled
like 123451234512345... and clumping together then saving. The problem is i'm getting the following error:
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 96, in read
len = i32(s)
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 44, in i32
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24)
IndexError: string index out of range
i found the PngImagePlugin.py and I looked at what it had:
def i32(c):
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24) (line 44)
"Fetch a new chunk. Returns header information."
if self.queue:
cid, pos, len = self.queue[-1]
del self.queue[-1]
self.fp.seek(pos)
else:
s = self.fp.read(8)
cid = s[4:]
pos = self.fp.tell()
len = i32(s) (lines 88-96)
i would try tinkering, but I'm afraid I'll screw up png and PIL, which have been erksome to get working.
thanks
It would appear that len(s) < 4 at this stage
len = i32(s)
Which means that
s = self.fp.read(8)
isn't reading the whole 4 bytes
probably the data in the fp you are passing isn't making sense to the image decoder.
Double check that you are slicing correctly
Make sure that the string you are passing in is of at least length 4.
Related
I am writing a program to read and write bits from a file to another file. I found a library called bitstring that helps to manipulate bits as strings. However, this library helps me to read bits, but I cannot write the read bits. Both inputs and outputs files have the same size, so it will be no problem in term of bytes. This is a part of my code.
import bitstring
file = bitstring.ConstBitStream(filename='paper.pdf')
print(file.length)
bits_to_read = 5000000
last_bits = 0
while file.pos < file.length-bits_to_read:
bits = file.read(bits_to_read)
str_bits = bitstring.BitArray(bits).bin
rest = file.length - file.pos
bits = file.read(rest)
str_bits = bitstring.BitArray(bits).bin
with kind of regards.
So, I have found a solution. I appended the resulted bits into one variable and next, I exported. This is a part of the code:
while file.pos < file.length-bits_to_read:
bits = file.read(bits_to_read)
str_bits = bitstring.BitArray(bits).bin
encrypted_bits = ''.join(encrypt(str_bits, cipher))
exported_str = exported_str + encrypted_bits
rest = file.length - file.pos
bits = file.read(rest)
str_bits = bitstring.BitArray(bits).bin
exported_str = exported_str + str_bits
exported_bits = bitstring.BitArray(bin=exported_str)
with open(output_name, 'wb') as f:
f.write(exported_bits.tobytes())
I am reading in a CSV file in Python that looks like this:
REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
Alabama,2138093,2348174,2646248,2832961,3061743,3266740,3444165,3893888,4040587,4447100,4779736
Alaska,64356,55036,59278,72524,128643,226167,300382,401851,550043,626932,710231
My problem is that when i read the first line it reads it as
REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
which in first place doesn't seem as much as a problem.
But later on I look for a number so a split the string into a list
lijst_eerste_regel = self.eerste_regel.split(",")
and then look for the index of str(2010) but Python then seems to look for '2010' not "2010". Therefor it won't find the index.
I post the code right here(it is in a class I am having this problem, not sure if that is relevant or not)
import io
class Volkstelling:
def __init__(self,jaartal,csvb):
"""
>>> vs2010 = Volkstelling(2010, 'vs_bevolkingsaantal.csv')
"""
import csv
self.jaartal = jaartal
self.csvb = csvb
self.eerste_regel = next(self.csvb)
if str(jaartal) not in self.eerste_regel:
raise AssertionError ("geen gegevens beschikbaar")
def inwoners(self, regio):
lijst_eerste_regel = self.eerste_regel.split(",")
plaats_jaartal = lijst_eerste_regel.index(self.jaartal) # here is where the error occurs
data = """REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
Alabama,2138093,2348174,2646248,2832961,3061743,3266740,3444165,3893888,4040587,4447100,4779736
Alaska,64356,55036,59278,72524,128643,226167,300382,401851,550043,626932,710231"""
v = Volkstelling('2010',io.StringIO(data))
v.inwoners('Alabama')
## ValueError: '2010' not in list
Your code had several issues leading to 2010 being not found:
If you read in files, each line has a newline character, commonly represented as \n, at the end. Insert the following code into your inwoners function to see the newline character behind 2010:
print(lijst_eerste_regel)
You can remove whitespaces and newlines using the python function 'SOME STRING'.strip()
Your function did not return a value, so you get None from inwoners even if it would run correctly.
The following example works:
import io
class Volkstelling:
def __init__(self,jaartal,csvb):
"""
>>> vs2010 = Volkstelling(2010, 'vs_bevolkingsaantal.csv')
"""
import csv
self.jaartal = jaartal
self.csvb = csvb
self.eerste_regel = next(self.csvb)
if str(jaartal) not in self.eerste_regel:
raise AssertionError ("geen gegevens beschikbaar")
def inwoners(self, regio):
lijst_eerste_regel = [s.strip() for s in self.eerste_regel.split(",")]
plaats_jaartal = lijst_eerste_regel.index(self.jaartal)
return plaats_jaartal # Returns the column index where to find the no of inhabitants
data = """REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
Alabama,2138093,2348174,2646248,2832961,3061743,3266740,3444165,3893888,4040587,4447100,4779736
Alaska,64356,55036,59278,72524,128643,226167,300382,401851,550043,626932,710231"""
v2 = Volkstelling('1920',io.StringIO(data))
print(v2.inwoners('Alabama'))
## -> prints 2
v1 = Volkstelling('2010',io.StringIO(data))
print(v1.inwoners('Alabama'))
## -> prints 11
I've got some data in a binary file that I need to parse. The data is separated into chunks of 22 bytes, so I'm trying to generate a list of tuples, each tuple containing 22 values. The file isn't separated into lines though, so I'm having problems figuring out how to iterate through the file and grab the data.
If I do this it works just fine:
nextList = f.read(22)
newList = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList)
where newList contains a tuple of 22 values. However, if I try to apply similar logic to a function that iterates through, it breaks down.
def getAllData():
listOfAll = []
nextList = f.read(22)
while nextList != "":
listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
nextList = f.read(22)
return listOfAll
data = getAllData()
gives me this error:
Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
data = getAllData()
File "<pyshell#26>", line 5, in getAllData
listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
struct.error: unpack requires a bytes object of length 22
I'm fairly new to python so I'm not too sure where I'm going wrong here. I know for sure that the data in the file breaks down evenly into sections of 22 bytes, so it's not a problem there.
Since you reported that it was running when len(nextList) == 0, this is probably because nextList (which isn't a list..) is an empty bytes object which isn't equal to an empty string object:
>>> b"" == ""
False
and so the condition in your line
while nextList != "":
is never true, even when nextList is empty. That's why using len(nextList) != 22 as a break condition worked, and even
while nextList:
should suffice.
read(22) isn't guaranteed to return a string of length 22. It's contract is to return string of length from anywhere between 0 and 22 (inclusive). A string of length zero indicates there is no more data to be read. In python 3 file objects produce bytes objects instead of str. str and bytes will never be considered equal.
If your file is small-ish then you'd be better off to read the entire file into memory and then split it up into chunks. eg.
listOfAll = []
data = f.read()
for i in range(0, len(data), 22):
t = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", data[i:i+22])
listOfAll.append(t)
Otherwise you will need to do something more complicated with checking the amount of data you get back from the read.
def dataiter(f, chunksize=22, buffersize=4096):
data = b''
while True:
newdata = f.read(buffersize)
if not newdata: # end of file
if not data:
return
else:
yield data
# or raise error as 0 < len(data) < chunksize
# or pad with zeros to chunksize
return
data += newdata
i = 0
while len(data) - i >= chunksize:
yield data[i:i+chunksize]
i += chunksize
try:
data = data[i:] # keep remainder of unused data
except IndexError:
data = b'' # all data was used
I'm programming a small script that is meant to open a binary file, find an often-changing binary blob, and copy just that blob to a new file.
Here's the layout of the binary file:
-JUNK (Unknown Size) (Unknown Contents)
-3-byte HEADER containing encoded size of blob
-PADDING (Unknown Size) (Every byte is FF in hex)
-Start of blob (72 bytes) (Unknown Contents)
-16 bytes that are ALWAYS the same
-End of blob (Size can be determined from subtracting (72+16) from value HEADER) (Unknown Contents)
-JUNK (Unknown Size) (Unknown Contents)
Here's the code I've written so far:
from sys import argv
import binascii
import base64
InputFileName = argv[1]
with open(InputFileName, 'rb') as InputFile:
Constant16 = base64.b64decode("GIhTSuBask6y60iLI2VwIg==")
Constant16Offset = InputFile.read().find(Constant16)
InputFile.seek(Constant16Offset)
InputFile.seek(-72,1)
InputFile.seek(-1,1)
FFTestVar = InputFile.read(1)
while FFTestVar == b'\xFF':
InputFile.seek(-2,1)
FFTestVar = InputFile.read(1)
InputFile.seek(-3,1)
BlobSizeBin = InputFile.read(3)
BlobSizeHex = binascii.b2a_hex(BlobSizeBin)
BlobSizeDec = int(BlobSizeHex, 16)
InputFile.seek(Constant16Offset)
InputFile.seek(-72,1)
Blob = InputFile.read(BlobSizeDec)
with open('output.bin', 'wb') as OutputFile:
OutputFile.write(Blob)
Unfortunately, the while loop is SLOW. InputFile could be up to 24MB large, and the padding could be a huge chunk of that. Going through it one byte at a time is ridiculously slow.
I'm thinking that there's probably a better way of doing this, but an hour or two of Googling hasn't been helpful.
Thanks!
You can read whole file into memory (you actually do it):
data = InputFile.read()
And then you can treat data like casual string (but it's not unicode string but an array of bytes, which is unfortunately called str under python 2.X). You need to remember offset so we will create offset attribute. Every line which looks like InputFile.seek(xx) must be translated into offset = xx and InputFile.seek(xx, 1) into offset += xx.
magic_number = base64.b64decode("GIhTSuBask6y60iLI2VwIg==")
offset = magic_number_offset = data.find(magic_number)
offset -= 72
Then, instead of while loop use re module (you need to import that module):
pattern = re.compile("[^\xFF]\xFF*$")
offset = pattern.search(data, endpos=offset).start() + 1
And the rest of code is:
offset -= 3
blob_size_bin = data[offset:offset+3]
blob_size_hex = binascii.b2a_hex(blob_size_bin)
blob_size_dec = int(blob_size_hex, 16)
offset = magic_number_offset - 72
blob = data[offset:offset+blob_size_dec]
If the files are really big and the python process consumes a lot of memory, you can use mmap module instead of loading whole file into memory.
If this solutions is still slow, you can reverse order of your data (reversed_data = data[::-1]) and search for pattern [^\ff].
I wrote a python script to create a binary file of integers.
import struct
pos = [7623, 3015, 3231, 3829]
inh = open('test.bin', 'wb')
for e in pos:
inh.write(struct.pack('i', e))
inh.close()
It worked well, then I tried to read the 'test.bin' file using the below code.
import struct
inh = open('test.bin', 'rb')
for rec in inh:
pos = struct.unpack('i', rec)
print pos
inh.close()
But it failed with an error message:
Traceback (most recent call last):
File "readbinary.py", line 10, in <module>
pos = struct.unpack('i', rec)
File "/usr/lib/python2.5/struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 4
I would like to know how I can read these file using struct.unpack.
Many thanks in advance,
Vipin
for rec in inh: reads one line at a time -- not what you want for a binary file. Read 4 bytes at a time (with a while loop and inh.read(4)) instead (or read everything into memory with a single .read() call, then unpack successive 4-byte slices). The second approach is simplest and most practical as long as the amount of data involved isn't huge:
import struct
with open('test.bin', 'rb') as inh:
indata = inh.read()
for i in range(0, len(data), 4):
pos = struct.unpack('i', data[i:i+4])
print(pos)
If you do fear potentially huge amounts of data (which would take more memory than you have available), a simple generator offers an elegant alternative:
import struct
def by4(f):
rec = 'x' # placeholder for the `while`
while rec:
rec = f.read(4)
if rec: yield rec
with open('test.bin', 'rb') as inh:
for rec in by4(inh):
pos = struct.unpack('i', rec)
print(pos)
A key advantage to this second approach is that the by4 generator can easily be tweaked (while maintaining the specs: return a binary file's data 4 bytes at a time) to use a different implementation strategy for buffering, all the way to the first approach (read everything then parcel it out) which can be seen as "infinite buffering" and coded:
def by4(f):
data = inf.read()
for i in range(0, len(data), 4):
yield data[i:i+4]
while leaving the "application logic" (what to do with that stream of 4-byte chunks) intact and independent of the I/O layer (which gets encapsulated within the generator).
I think "for rec in inh" is supposed to read 'lines', not bytes. What you want is:
while True:
rec = inh.read(4) # Or inh.read(struct.calcsize('i'))
if len(rec) != 4:
break
(pos,) = struct.unpack('i', rec)
print pos
Or as others have mentioned:
while True:
try:
(pos,) = struct.unpack_from('i', inh)
except (some_exception...):
break
Check the size of the packed integers:
>>> pos
[7623, 3015, 3231, 3829]
>>> [struct.pack('i',e) for e in pos]
['\xc7\x1d\x00\x00', '\xc7\x0b\x00\x00', '\x9f\x0c\x00\x00', '\xf5\x0e\x00\x00']
We see 4-byte strings, it means that reading should be 4 bytes at a time:
>>> inh=open('test.bin','rb')
>>> b1=inh.read(4)
>>> b1
'\xc7\x1d\x00\x00'
>>> struct.unpack('i',b1)
(7623,)
>>>
This is the original int! Extending into a reading loop is left as an exercise .
You can probably use array as well if you want:
import array
pos = array.array('i', [7623, 3015, 3231, 3829])
inh = open('test.bin', 'wb')
pos.write(inh)
inh.close()
Then use array.array.fromfile or fromstring to read it back.
This function reads all bytes from file
def read_binary_file(filename):
try:
f = open(filename, 'rb')
n = os.path.getsize(filename)
data = array.array('B')
data.read(f, n)
f.close()
fsize = data.__len__()
return (fsize, data)
except IOError:
return (-1, [])
# somewhere in your code
t = read_binary_file(FILENAME)
fsize = t[0]
if (fsize > 0):
data = t[1]
# work with data
else:
print 'Error reading file'
Your iterator isn't reading 4 bytes at a time so I imagine it's rather confused. Like SilentGhost mentioned, it'd probably be best to use unpack_from().