I have a case similar to this question.
Faced it when training deep learning model and reading images inside the loop.
scipy.misc.imread(rawImagePath).astype(numpy.float32)
The code above is working perfectly the vast majority of the time, but sometimes I get the error:
batch 90, loading batch data...
x path: /path_to_dataset/train/9.png
2017-10-31 20:23:34.531221: W tensorflow/core/kernels/queue_base.cc:294] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 452, in <module>
batch_X = load_data(batch_X_paths)
File "/path_to_my_script/utilities.py", line 85, in load_data
x = misc.imread(batch_X_paths[i]).astype(np.float32)
TypeError: float() argument must be a string or a number
The difference is: I have a fixed set of files and I do not write new ones to the directory.
I made a little research and figured out that error comes not from
numpy.astype()
but from
PIL.Image.open()
scipy.misc uses PIL under the hood.
Also, I managed to reproduce this bug in Jupyter Notebook. I made a loop with reading/astype code inside. After few seconds (the size of my test .png is about 10MB) I interrupt cell's execution and voila!
error reproduced in Jupyter Notebook
Inside scipy.misc sources there is a code like this:
def imread(name, flatten=False, mode=None):
im = Image.open(name)
return fromimage(im, flatten=flatten, mode=mode)
Image here is a PIL object. And Image.open() gives us 0-dimension list when interrupting cell.
I have tried Evert's advice and made a loop with try/except, but sometimes it fails even after 5 attempts.
Also I tried OpenCV imread() method instead of scipy.misc and sometimes it fails too. Error text:
libpng error: incorrect data check
So... I ran out of ideas.
Related
Hi I am a new to computer visionand stackoverflow and I have a problem with my python 3 program on Windows,as the cv2.findContours() function returns 2 instead of three values as in the documentation. I passed 2 values for return to solve the bug,the type of the first(image) is a list and that of the second (cnts)is an int32 but none of them is abled to be used in cv2.drawContours() without bugging here I use image as parameter in because it is the only list returned so I guess it is the contours list cv2.drawContours().So here is the code:
#This is the program for a document scanner so as to extract a document
#from any image and apply perspective transform to show it as final result
import numpy as np
import cv2
import imutils
from matplotlib import pyplot as plt
cap=cv2.VideoCapture(0)
ret,img=cap.read()
img1=img.copy()
cv2.imshow('Image',img1)
img1=cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)
img1=cv2.bilateralFilter(img1,7,17,17)
img1=cv2.Canny(img1,30,200)
image,cnts=cv2.findContours(img1,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
#cnts=np.asarray(cnts,dtype='uint8')
cnts=np.array(cnts)
cv2.imshow('Edge',img1)
print('cnts var data type',cnts.dtype)
#print("Hi")
img=cv2.drawContours(img,[image],-1,(255,255,0),3)
Here is the python idle shell result appearing now:
cnts var data type is int32
Traceback (most recent call last):
File "D:\PyCharm Projects\Test_1_docscanner.py", line 20, in <module>
img=cv2.drawContours(img,[image],-1,(255,255,0),3)
TypeError: contours is not a numpy array, neither a scalar
I got it working finally,the following I did:
First I had previously messed up with most of my environmental variables haven suppressed some system variables. So I with help of a friend I retrieved as much as I could and deleted those I had stupidly ignorantly created.
Secondly I uninstalled all other python versions(at least I tried) though it seems that I still see their icons around(they seem to be "undeletable") and even the one I was using(Python3.7.3). I then install Python 3.7.4.
Thirdly,and that must be the answer is that I added this line cnts=imutils.grab_contours(cnts) before the cv2.drawContours() functions. Getting this from imutils package from Adrian Rosebrock github. my code now works because of that line which helps to parse the contours for whatever cv2.drawContours() opencv version you are using thereby avoiding conflicts of versions originating from cv2.findContours() function used prior to cv2.drawContours().
In conclusion I tried imutils.grab_contours() previously to these changes on my python3.7.3 but it did not work. But I believe above all the combination of "cnts=imutils.grab_contours(cnts)" and the update to Python3.7.4,is what solved the issue.
Hope this is helpful
in Python 2.7.10 Anaconda 2.3.0 (64-bit), if I write
sys.path.append('C:\\Anaconda\\sms-tools-master\\software\\models\\utilFunctions_C\\')
print sys.path
I get
C:\Anaconda\sms-tools-master\workspace\A1>python A1Part1.py
['C:\Anaconda\sms-tools-master\workspace\A1', 'C:\Anaconda\python27.zip',
'C:\Anaconda\DLLs', 'C:\Anaconda\lib', 'C:\Anaconda\lib\plat-win', 'C:\A
naconda\lib\lib-tk', 'C:\Anaconda', 'C:\Anaconda\lib\site-packages', 'C:\
Anaconda\lib\site-packages\Sphinx-1.3.1-py2.7.egg', 'C:\Anaconda\lib\site-
packages\cryptography-0.9.1-py2.7-win-amd64.egg', 'C:\Anaconda\lib\site-pack
ages\win32', 'C:\Anaconda\lib\site-packages\win32\lib', 'C:\Anaconda\lib
\site-packages\Pythonwin', 'C:\Anaconda\lib\site-packages\setuptools-17.1.
1-py2.7.egg', 'C:\Anaconda\sms-tools-master\software\models\utilFunctions_C
\']
Is this absolute way of adding to sys.path correct? Is there a relative way?
In the next line of python code I wrote
from utilFunctions_C import wavread
I instantly get
ImportError: cannot import name wavread
if I run the code in cmd but if I run the code inside IDLE I get:
['C:\Anaconda\sms-tools-master\workspace\A1',
'C:\Python27\Lib\idlelib', 'C:\Windows\system32\python27.zip',
'C:\Python27\DLLs', 'C:\Python27\lib',
'C:\Python27\lib\plat-win', 'C:\Python27\lib\lib-tk',
'C:\Python27', 'C:\Python27\lib\site-packages',
'C:\Anaconda\sms-tools-master\software\models\utilFunctions_C\']
Traceback (most recent call last): File
"C:\Anaconda\sms-tools-master\workspace\A1\A1Part1.py", line 8, in
from utilFunctions_C import wavread ImportError: DLL load failed: %1 is not a valid Win32 application.
So why there is a difference and how to tackle this issue? Thnx!
I commented
from utilFunctions_C import wavread
and used
from scipy.io.wavfile import read
Now my code is okay. I found that
utiLFunctions.wavread() is a wrapper that uses scipy.io.wavfile.read()
and scales the data to floating point between -1 and 1. If you open up
utilFunctions.py you will see that.
You can use scipy.io.wavfile.read as well, as long you scale the data
correctly looking at the datatype in the wav file. Due to scaling, for
wav files that store samples as int16, you will see that
scipy.io.wavfile.read returns values will be 32767 times the values
returned by utilFunctions.wavread
Lectures used the function to explain the process more explicitly.
Once you have got it, you can use the wrapper utilFunctions.wavread
for the rest of the course and in practical applications.
Scroll in
https://class.coursera.org/audio-002/forum/search?q=Cannot+import+name+wavread#15-state-query=wavread&15-state-page_num=1
for more details.
I have a huge dataset on which I wish to PCA. I am limited by RAM and computational efficency of PCA.
Therefore, I shifted to using Iterative PCA.
Dataset Size-(140000,3504)
The documentation states that This algorithm has constant memory complexity, on the order of batch_size, enabling use of np.memmap files without loading the entire file into memory.
This is really good, but unsure on how take advantage of this.
I tried load one memmap hoping it would access it in chunks but my RAM blew.
My code below ends up using a lot of RAM:
ut = np.memmap('my_array.mmap', dtype=np.float16, mode='w+', shape=(140000,3504))
clf=IncrementalPCA(copy=False)
X_train=clf.fit_transform(ut)
When I say "my RAM blew", the Traceback I see is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\base.py", line 433, in fit_transfo
rm
return self.fit(X, **fit_params).transform(X)
File "C:\Python27\lib\site-packages\sklearn\decomposition\incremental_pca.py",
line 171, in fit
X = check_array(X, dtype=np.float)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 347, in
check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError
How can I improve this without comprising on accuracy by reducing the batch-size?
My ideas to diagnose:
I looked at the sklearn source code and in the fit() function Source Code I can see the following. This makes sense to me, but I am still unsure about what is wrong in my case.
for batch in gen_batches(n_samples, self.batch_size_):
self.partial_fit(X[batch])
return self
Edit:
Worst case scenario I will have to write my own code for iterativePCA which batch processes by reading and closing .npy files. But that would defeat the purpose of taking advantage of already present hack.
Edit2:
If somehow I could delete a batch of processed memmap file. It would make much sense.
Edit3:
Ideally if IncrementalPCA.fit() is just using batches it should not crash my RAM. Posting the whole code, just to make sure I am not making a mistake in flushing the memmap completely to disk before.
temp_train_data=X_train[1000:]
temp_labels=y[1000:]
out = np.empty((200001, 3504), np.int64)
for index,row in enumerate(temp_train_data):
actual_index=index+1000
data=X_train[actual_index-1000:actual_index+1].ravel()
__,cd_i=pywt.dwt(data,'haar')
out[index] = cd_i
out.flush()
pca_obj=IncrementalPCA()
clf = pca_obj.fit(out)
Surprisingly, I note out.flush doesn't free my memory. Is there a way to using del out to free my memory completely and then someone pass a pointer of the file to IncrementalPCA.fit().
You have hit a problem with sklearn in a 32 bit environment. I presume you are using np.float16 because you're in a 32 bit environment and you need that to allow you to create the memmap object without numpy thowing errors.
In a 64 bit environment (tested with Python3.3 64 bit on Windows), your code just works out of the box. So, if you have a 64 bit computer available - install python 64-bit and numpy, scipy, scikit-learn 64 bit and you are good to go.
Unfortunately, if you cannot do this, there is no easy fix. I have raised an issue on github here, but it is not easy to patch. The fundamental problem is that within the library, if your type is float16, a copy of the array to memory is triggered. The detail of this is below.
So, I hope you have access to a 64 bit environment with plenty of RAM. If not, you will have to split up your array yourself and batch process it, a rather larger task...
N.B It's really good to see you going to the source to diagnose your problem :) However, if you look at the line where the code fails (from the Traceback), you will see that the for batch in gen_batches code that you found is never reached.
Detailed diagnosis:
The actual error generated by OP code:
import numpy as np
from sklearn.decomposition import IncrementalPCA
ut = np.memmap('my_array.mmap', dtype=np.float16, mode='w+', shape=(140000,3504))
clf=IncrementalPCA(copy=False)
X_train=clf.fit_transform(ut)
is
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\base.py", line 433, in fit_transfo
rm
return self.fit(X, **fit_params).transform(X)
File "C:\Python27\lib\site-packages\sklearn\decomposition\incremental_pca.py",
line 171, in fit
X = check_array(X, dtype=np.float)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 347, in
check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError
The call to check_array(code link) uses dtype=np.float, but the original array has dtype=np.float16. Even though the check_array() function defaults to copy=False and passes this to np.array(), this is ignored (as per the docs), to satisfy the requirement that the dtype is different; therefore a copy is made by np.array.
This could be solved in the IncrementalPCA code by ensuring that the dtype was preserved for arrays with dtype in (np.float16, np.float32, np.float64). However, when I tried that patch, it only pushed the MemoryError further along the chain of execution.
The same copying problem occurs when the code calls linalg.svd() from the main scipy code and this time the error occurs during a call to gesdd(), a wrapped native function from lapack. Thus, I do not think there is a way to patch this (at least not an easy way - it is at minimum alteration of code in core scipy).
Does the following alone trigger the crash?
X_train_mmap = np.memmap('my_array.mmap', dtype=np.float16,
mode='w+', shape=(n_samples, n_features))
clf = IncrementalPCA(n_components=50).fit(X_train_mmap)
If not then you can use that model to transform (project your data iteratively) to a smaller data using batches:
X_projected_mmap = np.memmap('my_result_array.mmap', dtype=np.float16,
mode='w+', shape=(n_samples, clf.n_components))
for batch in gen_batches(n_samples, self.batch_size_):
X_batch_projected = clf.transform(X_train_mmap[batch])
X_projected_mmap[batch] = X_batch_projected
I have not tested that code but I hope that you get the idea.
I'm developing a small game with pyglet. One centerpiece is, of course, drawing coloured rectangels. I initially did this by creating images in memory and blit()ing them, which worked fine. After noticing how ugly, roundabout and inefficent (yes, I profiled - ColorRect.draw() took significant time and became 10x more efficent through this change) this is, I've started creating vertex lists instead, via pyglet.graphics.Batch (I copied most of the code verbatim from one of the examples). Since then, I experience a weird exception in some low-level OpenGL code that I failed to find a cause for or reproduce reliably.
There is no apparent relation to gameplay events -- as in, nothing exceptional happens just before, or I constantly miss it. As the error occurs somewhere deep in the event loop, I cannot easily track down which position update causes it. Honestly, I'm stumped. Thus I'll braindump what I have found out and hope for some kind psychic.
I've tried it out on Windows 7 32 bit (I may get around to try it on Ubuntu 11.10 soon) with Python 3.2.2, with a pyglet revision 043180b64260 (pulled from Goggle Code and built from source, the 1.1.4 release is harder to install as it doesn't run 2to3 automatically, though it appears to be equally py3k-ready). I'll probably update to the latest mercurial version next, but it's only a few commits and the changes seem entirely unrelated.
The full traceback (censored some paths out of principle, but note it's in its own virtualenv):
Traceback (most recent call last):
File "<my main file>", line 152, in <module>
main()
File "<my main file>", line 148, in main
run()
File "<my main file>", line 125, in run
pyglet.app.run()
File "<virtualenv>\Lib\site-packages\pyglet\app\__init__.py", line 123, in run
event_loop.run()
File "<virtualenv>\Lib\site-packages\pyglet\app\base.py", line 135, in run
self._run_estimated()
File "<virtualenv>\Lib\site-packages\pyglet\app\base.py", line 164, in _run_estimated
timeout = self.idle()
File "<virtualenv>\Lib\site-packages\pyglet\app\base.py", line 278, in idle
window.switch_to()
File "<virtualenv>\Lib\site-packages\pyglet\window\win32\__init__.py", line 305, in switch_to
self.context.set_current()
File "<virtualenv>\Lib\site-packages\pyglet\gl\win32.py", line 213, in set_current
super(Win32Context, self).set_current()
File "<virtualenv>\Lib\site-packages\pyglet\gl\base.py", line 320, in set_current
buffers = (gl.GLuint * len(buffers))(*buffers)
IndexError: invalid index
Running with post-mortem (actively stepping through code until it happens used to be infeasible as the FPS went from 60 down to 7) pdb shows:
buffers is a list of ints; I have no idea what these represent or where they come from, but they are pulled from a list called self.object_space._doomed_textures (where self is an window object). The associated comment says this block of code releases texture scheduled for deletion. I don't think I explicitly use textures anywhere, but who knows what pyglet does under the hood. I assume these integers are the IDs or something of the textures to be destroyed.
gl.GLuint is an alias for ctypes.c_ulong; Thus (gl.GLuint * len(buffers))(*buffers) creates an ulong array of the same length and contents
I can evaluate the very same expression at the pdb prompt without errors or data corruption.
Independent experiments (outside the virtualenv and without importing pyglet) with ctypes shows that IndexError is raised if too many arguments are given to the array constructor. This makes no sense, both experimentation and logic suggest the length and argument count must always match.
Are there other cases where this exception may occur? May this be a bug of pyglet, or am I misusing the library and missed the associated warning?
Would the code which creates and maintains the vertex lists be of any use in debugging this? There's probably something wrong with it. I've already stared at it, but since I have little experience with pyglet.graphics, this was of limited use. Just leave a comment if you'd like to see the ColorRect code.
Any other ideas what might cause this?
It is a bit hard to provide a really relevant answer since there is no code provided but from what I can see from the error output.
buffers = (gl.GLuint * len(buffers))(*buffers)
So if I understand correctly, you are multiplying the size of an GLuint (4 bytes) with your actually buffers length (if initialized). Maybe that's why your Index is invalid, because it is too high?
Usually it would be ok since a buffer is in bytes, but you said that it is a list of ints?
Hope it helps
I am getting an error (see below) when trying to use cv.CreateHist in Python. I
am also noticing another alarming problem. If I spit out all of the attributes
of the cv module into a file, and then I search them, I find that a ton of
common things are missing.
For example, cv.TermCriteria() is not there; cv.ConnectedComp is not there; and
cv.CvRect is not there.
Everything about my installation, with Open CV 2.2, worked just fine. I can plot
images, make CvScalars, and call plenty of the functions, like cv.CamShift...
but there are a dozen or so of these hit-or-miss functions or data structures
that are simply missing with no explanation.
Here's my code for cv.CreateHist:
import cv
q = cv.CreateHist([1],1,cv.CV_HIST_ARRAY)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: x9y��
The weird wingding stuff is actually what it spits out at the command line, not a copy-paste error. Can anyone help figure this out? It's incredibly puzzling.
Ely
As for CvRect, see the documentation. It says such types are represented as Pythonic tuples.
As for your call to CreateHist, you may be passing the arguments in wrong order. See createhist in the docs for python opencv.