pickle.load - EOFError: Ran out of input

pickle.load - EOFError: Ran out of input - python

I have an .obj file in which, previously, I transformed an image to base64 and saved with pickle.
The problem is when I try to load the .obj file with pickle, convert the code into image from base64, and load it with pygame.
The function that loads the image:
def mainDisplay_load(self):
main_folder = path.dirname(__file__)
img_main_folder = path.join(main_folder, "sd_graphics")
# loadImg
self.mainTerminal = pg.image.load(path.join(img_main_folder, self.main_uncode("tr.obj"))).convert_alpha()
The function that decodes the file:
def main_uncode(self, object):
openFile = open(object, "rb")
str = pickle.load(openFile)
openFile.close()
fileData = base64.b64decode(str)
return fileData
The error I get when the code is run:
str = pickle.load(openFile)
EOFError: Ran out of input
How can I fix it?
Python version: 3.6.2
Pygame version: 1.9.3
Update 1
This is the code I used to create the .obj file:
import base64, pickle
with open("terminal.png", "rb") as imageFile:
str = base64.b64encode(imageFile.read())
print(str)
file_pi = open("tr.obj","wb")
pickle.dump(str,file_pi)
file_pi.close()
file_pi2 = open("tr.obj","rb")
str2 = pickle.load(file_pi2)
file_pi2.close()
imgdata = base64.b64decode(str2)
filename = 'some_image.jpg' # I assume you have a way of picking unique filenames
with open(filename, 'wb') as f:
f.write(imgdata)
Once the file is created, it is loaded and a second image is created. This is to check if the image is the same or there are errors in the conversion.
As you can see, I used part of the code to load the image, but instead of saving it, it is loaded into pygame. And that's where the mistake occurs.
Update 2
I finally managed to solve it.
In the main code:
def mainDisplay_load(self):
self.all_graphics = pg.sprite.Group()
self.graphics_menu = pg.sprite.Group()
# loadImg
self.img_mainTerminal = mainGraphics(self, 0, 0, "sd_graphics/tr.obj")
In the library containing graphics classes:
import pygame as pg
import base64 as bs
import pickle as pk
from io import BytesIO as by
from lib.setting import *
class mainGraphics(pg.sprite.Sprite):
def __init__(self, game, x, y, object):
self.groups = game.all_graphics, game.graphics_menu
pg.sprite.Sprite.__init__(self, self.groups)
self.game = game
self.object = object
self.outputGraphics = by()
self.x = x
self.y = y
self.eventType()
self.rect = self.image.get_rect()
self.rect.x = self.x * tilesizeDefault
self.rect.y = self.y * tilesizeDefault
def eventType(self):
openFile = open(self.object, "rb")
str = pk.load(openFile)
openFile.close()
self.outputGraphics.write(bs.b64decode(str))
self.outputGraphics.seek(0)
self.image = pg.image.load(self.outputGraphics).convert_alpha()
For the question of why I should do such a thing, it is simple:
any attacker with sufficient motivation can still get to it easily
Python is free and open.
On the one hand, we have a person who intentionally goes and modify and recover the hidden data. But if Python is an open language, as with even more complicated and protected languages, the most motivated are able to crack the game or program and retrieve the same data.
On the other hand, we have a person who knows only the basics, or not even that. A person who cannot access the files without knowing more about the language, or decoding the files.
So you can understand that decoding files, from my point of view, does not need to be protected from a motivated person. Because even with a more complex and protected language, that motivated person will be able to get what he wants. The protection is used against people who have no knowledge of the language.

So, if the error you get is indeed "pickle: run out of input", that propably means you messed your directories in the code above, and are trying to read an empty file with the same name as your obj file is.
Actually, as it is, this line in your code:
self.mainTerminal=pg.image.load(path.join(img_main_folder,self.main_uncode
("tr.obj"))).convert_alpha()
Is completly messed up. Just read it and you can see the problem: you are passing to the main_uncode method just the file name, without directory information. And then, if it would by chance have worked, as I have poitned in the comments a while ago, you would try to use the unserialized image data as a filename from where to read your image. (You or someone else had probably thought that main_uncode should write a temporary image file and writ the image data to that, so that Pygame could read it, but as it is, it is just returning the raw image data in a string).
Threfore, by fixing the above call and passing an actual path to main_uncode, and further modifying it to write the temporary data to a file and return its path would fix the snippets of code above.
Second thing is I can't figure out why do you need this ".obj" file at all. If it is just for "security through obscurity" hopping people get your bundled file can't open the images, that is a thing far from a recommended practice. To sum up just one thing: it will delay legitimate uses of your file (like, you yourself does not seem to be able to use it ), while any attacker with sufficient motivation can still get to it easily. By opening an image, base-64 encoding and pickling it, and doing the reverse process you are doing essentially a no-operation. Even more, a pickle file can serialize and write to disk complex Python objects - but a base64 serialization of an image could be written directly to a file, with no need for pickle.
Third thing: just use with to open all the files, not just the ones you read with the imaging library, Take your time to learn a little bit more about Python.

Related

How to use helper files in PyCharm

I am trying to follow along with a project written by Mike Smales - "Sound Classification using Deep Learning". In there, the author wrote a helper file called wavfilehelper.py:
wavehelper.py Code
import struct
class WavFileHelper():
def read_file_properties(self, filename):
wave_file = open(filename,"rb")
riff = wave_file.read(12)
fmt = wave_file.read(36)
num_channels_string = fmt[10:12]
num_channels = struct.unpack('<H', num_channels_string)[0]
sample_rate_string = fmt[12:16]
sample_rate = struct.unpack("<I",sample_rate_string)[0]
bit_depth_string = fmt[22:24]
bit_depth = struct.unpack("<H",bit_depth_string)[0]
return (num_channels, sample_rate, bit_depth)
In his main program he calls the helper file like this:
from helpers.wavfilehelper import WavFileHelper
wavfilehelper = WavFileHelper()
However, when I run this block of code in PyCharm, it complains "ModuleNotFoundError: No module named 'helpers.wavfilehelper'"...how can I get this helper file to work in the PyCharm environment? Do I have to put the wavehelper.py file in a special folder to be called?
Any help will be greatly appreciated!

It is important to look at (and quote in your question) the actual error messages! In this case, which line is in-error? It is not the instantiation line, but the import - Python is unable to find the module on your machine (using its system paths).
Earlier in the article, the author talks about downloading his files from GitHub (to your machine). Did you follow that step?
Web.Ref: further information about solving this error

making a memory only fileobject in python with pyfilesystem

I have written a motion detection/ video program using opencv2 which saves video output for x seconds. if motion in detected during that time, the output is saved as an alternate named file, but if not motion is detected then the file is overwritten. To avoid needless wear on a flash based memory system, I want to write the file to the RAM, and if motion is detected then save it to the non-volatile memory.
I am trying to create this file in the RAM using pyfilesystem-fs.memoryfs
import numpy as np
import cv2, time, os, threading, thread
from Tkinter import *
from fs.memoryfs import MemoryFS
class stuff:
mem=MemoryFS()
output = mem.createfile('output.avi')
rectime=0
delay=0
kill=0
cap = cv2.VideoCapture(0)
#out = cv2.VideoWriter('C:\motion\\output.avi',cv2.cv.CV_FOURCC('F','M','P','4'), 30, (640,480),True)
out = cv2.VideoWriter(output, cv2.cv.CV_FOURCC('F','M','P','4'), 30, (640,480),True)
This is the motion detection part
if value > 100:
print "saving"
movement=time.time()
while time.time()<int(movement)+stuff.rectime:
stuff.out.write(frame)
ret, frame = stuff.cap.read()
if stuff.out.isOpened() is True:
stuff.out.release()
os.rename(stuff.output, 'c:\motion\\' + time.strftime('%m-%d-%y_%H-%M-%S') + '.avi')
the os.rename function returns TypeError must be string, not None
I'm clearly using memoryfs incorrectly, but cannot find any examples of its use.
EDIT
I use the following line to open the file object and write to it
stuff.out.open(stuff.output, cv2.cv.CV_FOURCC(*'FMP4'),24,(640,480),True)
however this returns False , I'm not sure but it appears it can't open the file object.

To move your file form MemoryFS to real file system, you should read orig file and write it to dest file, something like
with mem.open('output.avi', 'b') as orig:
with open('c:\\motion\\' + time.strftime('%m-%d-%y_%H-%M-%S') + '.avi')) as dest:
dest.write(orig.read())

Python2.7: Trying to pickle set of custom objects

I'm sort of a python nub, so bear with me. I have a set of custom classes, each of which basically wraps and adds some functionality to an image file that's been converted to a numpy.ndarray. Since it takes about 2 minutes to create all these objects each time the script is run, I was hoping to create a list of them and pickle that list. The pickling seems to go well; the unpickling fails.
This is all I'm doing:
Pickling
frame_jar_file = open(os.path.join(asset_path, "frame_jar.pkl"), "w+")
for x in range(1, 500):
path = os.path.join(img_path, "{0}.jpg".format(str(x).zfill(8)))
surface = NumpySurface(path)
self.scene_surfaces.append(surface)
frame_jar = cPickle.Pickler(frame_jar_file, -1) # have tried this with no protocol arg as well
frame_jar.dump(self.scene_surfaces)
frame_jar_file.close()
exit()
Produces a file about 2gb in size, which seems about right to me given the data.
Unpickling
self.scene_surfaces = cPickle.Unpickler(os.path.join(asset_path, "frame_jar.pkl"))
Provokes this error:
TypeError: argument must have 'read' and 'readline' attributes

You need to pass in an open file object, not the filename:
with open(os.path.join(asset_path, "frame_jar.pkl"), 'rb') as infh:
unpickler = cPickle.Unpickler(infh)
self.scene_surfaces = unpickler.load()
I also assumed you wanted to load the data, not just create an unpickler.

Is this a python 3 file bug?

Is this a bug? It demonstrates what happens when you use libtiff to extract an image from an open tiff file handle. It works in python 2.x and does not work in python 3.2.3
import os
# any file will work here, since it's not actually loading the tiff
# assuming it's big enough for the seek
filename = "/home/kostrom/git/wiredfool-pillow/Tests/images/multipage.tiff"
def test():
fp1 = open(filename, "rb")
buf1 = fp1.read(8)
fp1.seek(28)
fp1.read(2)
for x in range(16):
fp1.read(12)
fp1.read(4)
fd = os.dup(fp1.fileno())
os.lseek(fd, 28, os.SEEK_SET)
os.close(fd)
# this magically fixes it: fp1.tell()
fp1.seek(284)
expect_284 = fp1.tell()
print ("expected 284, actual %d" % expect_284)
test()
The output which I feel is in error is:
expected 284, actual -504
Uncommenting the fp1.tell() does some ... side effect ... which stabilizes the py3 handle, and I don't know why. I'd also appreciate if someone can test other versions of python3.

No, this is not a bug. The Python 3 io library, which provides you with the file object from an open() call, gives you a buffered file object. For binary files, you are given a (subclass of) io.BufferedIOBase.
The Python 2 file object is far more primitive, although you can use the io library there too.
By seeking at the OS level you are bypassing the buffer and are mucking up the internal state. Generally speaking, as the doctor said to the patient complaining that pinching his skin hurts: don't do that.
If you have a pressing need to do this anyway, at the very least use the underlying raw file object (a subclass of the io.RawIOBase class) via the io.BufferedIO.raw attribute:
fp1 = open(filename, "rb").raw

os.dup creates a duplicate file descriptor that refers to the same open file description. Therefore, os.lseek(fd, 28, SEEK_SET) changes the seek position of the file underlying fp1.
Python's file objects cache the file position to avoid repeated system calls. The side effect of this is that changing the file position without using the file object methods will desynchronize the cached position and the real position, leading to nonsense like you've observed.
Worse yet, because the files are internally buffered by Python, seeking outside the file methods could actually cause the returned file data to be incorrect, leading to corruption or other nasty stuff.
The documentation in bufferedio.c notes that tell can be used to reinitialize the cached value:
* The absolute position of the raw stream is cached, if possible, in the
`abs_pos` member. It must be updated every time an operation is done
on the raw stream. If not sure, it can be reinitialized by calling
_buffered_raw_tell(), which queries the raw stream (_buffered_raw_seek()
also does it). To read it, use RAW_TELL().

How to write a large amount of data in a tarfile in python without using temporary file

I've wrote a small cryptographic module in python whose task is to cipher a file and put the result in a tarfile. The original file to encrypt can be quit large, but that's not a problem because my program only need to work with a small block of data at a time, that can be encrypted on the fly and stored.
I'm looking for a way to avoid doing it in two passes, first writing all the data in a temporary file then inserting result in a tarfile.
Basically I do the following (where generator_encryptor is a simple generator that yield chunks of data read from sourcefile).
:
t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
for chunk in generator_encryptor("sourcefile"):
tmp.write(chunks)
tmp.close()
t.add(content)
t.close()
I'm a bit annoyed having to use a temporary file as I file it should be easy to write blocs directly in the tar file, but collecting every chunks in a single string and using something like t.addfile('content', StringIO(bigcipheredstring) seems excluded because I can't guarantee that I have memory enough to old bigcipheredstring.
Any hint of how to do that ?

You can create an own file-like object and pass to TarFile.addfile. Your file-like object will generate the encrypted contents on the fly in the fileobj.read() method.

Huh? Can't you just use the subprocess module to run a pipe through to tar? That way, no temporary file should be needed. Of course, this won't work if you can't generate your data in small enough chunks to fit in RAM, but if you have that problem, then tar isn't the issue.

Basically using a file-like object and passing it to TarFile.addfile do the trick, but there is still some issues open.
I need to known the full encrypted file size at the beginning
the way tarfile access to read method is such that the custom file-like object must always return full read buffers, or tarfile suppose it's end of file. It leads to some really inefficient buffer copying in the code of read method, but it's either that or change tarfile module.
The resulting code is below, basically I had to write a wrapper class that transform my existing generator into a file-like object. I also added the GeneratorEncrypto class in my example to make code compleat. You can notice it has a len method that returns the length of the written file (but understand it's just a dummy placeholder that does nothing usefull).
import tarfile
class GeneratorEncryptor(object):
"""Dummy class for testing purpose
The real one perform on the fly encryption of source file
"""
def __init__(self, source):
self.source = source
self.BLOCKSIZE = 1024
self.NBBLOCKS = 1000
def __call__(self):
for c in range(0, self.NBBLOCKS):
yield self.BLOCKSIZE * str(c%10)
def __len__(self):
return self.BLOCKSIZE * self.NBBLOCKS
class GeneratorToFile(object):
"""Transform a data generator into a conventional file handle
"""
def __init__(self, generator):
self.buf = ''
self.generator = generator()
def read(self, size):
chunk = self.buf
while len(chunk) < size:
try:
chunk = chunk + self.generator.next()
except StopIteration:
self.buf = ''
return chunk
self.buf = chunk[size:]
return chunk[:size]
t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
generator = GeneratorEncryptor("source")
ti = t.gettarinfo(name = "content")
ti.size = len(generator)
t.addfile(ti, fileobj = GeneratorToFile(generator))
t.close()

I guess you need to understand how the tar format works, and handle the tar writing yourself. Maybe this can be helpful?
http://mail.python.org/pipermail/python-list/2001-August/100796.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.