Memory Error: Numpy.random.normal - python

In theano the following code snippet is throwing Memory error:
self.w = theano.shared(
np.asarray(
np.random.normal(
loc=0.0, scale=np.sqrt(1.0/n_out), size=(n_in, n_out)),
dtype=theano.config.floatX),
name='w', borrow=True)
Just to mention the size n_in=64*56*56 and n_out=4096. The snippet is taken from the init method of a Fully Connected Layer. See the traceback:
Traceback (most recent call last):
File "<stdin>", line 8, in <module>
File "final.py", line 510, in __init__
loc=0.0, scale=np.sqrt(1.0/n_out), size=(n_in, n_out)),
File "mtrand.pyx", line 1636, in mtrand.RandomState.normal (numpy/random/mtrand/mtrand.c:20676)
File "mtrand.pyx", line 242, in mtrand.cont2_array_sc (numpy/random/mtrand/mtrand.c:7401)
MemoryError
Is there any way we can get around the problem?

A MemoryError is Pythons way of saying: "I tried getting enough memory for that operation but your OS says it doesn't have enough".
So there's no workaround. You have to do it another way (or buy more RAM!). I don't know what your floatX is, but your array contains 64*56*56*4096 elements that translates to:
6.125 GB if you use float64
3.063 GB if you use float32
1.531 GB if you use float16 (not sure if float16 is supported for your operations though)
But the problem with MemoryErrors is that just avoiding them once generally isn't enough. If you don't change your approach you'll get problems again as soon as you do an operation that requires an intermediate or new array (then you have two huge arrays) or that coerces to a higher dtype (then you have two huge arrays and the new one is of higher dtype so requires more space).
So the only viable workaround is to change the approach, maybe you can start by calculating subsets (map-reduce approach)?

Related

Numpy memory error while using audio analysis using python

I get the following error, while I test a more than 100 mb audio file.
Traceback (most recent call last):
File "C:\Users\opensource\Desktop\pyAudioAnalysis-master\audioFeatureExtractio
n.py", line 542, in stFeatureExtraction signal = numpy.double(signal)MemoryError
Assuming your data was int16 before, by upcasting to float64, you quadrupled the size of your array. This is likely more than the memory you had left, and it threw a MemoryError

Dilation image-recognition algorithms cityScapes model

I am trying to use this image-recognition algorithms using cityScapes model
https://github.com/fyu/dilation
However, I keep on getting the following error:
- bash-4.2$ python predict.py cityscapes sunny_1336601.png --gpu 0
Using GPU 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
Traceback (most recent call last):
File "predict.py", line 133, in <module>
main()
File "predict.py", line 129, in main
predict(args.dataset, args.input_path, args.output_path)
File "predict.py", line 98, in predict
color_image = dataset.palette[prediction.ravel()].reshape(image_size)
ValueError: cannot reshape array of size 12582912 into shape (1090,1920,3)
I tried reshaping the image to every common resolution I could think of, including 640x480, but I have been getting the same error.
Any help or tips is highly appreciated.
Thanks!
I don't have enough reputation to comment, so I am posting my hunch as an answer (forgive me if I'm wrong) : the given size 12582912 has to be a product of the three numbers in the tuple. A quick factorisation showed 12582912 = 1024*768*16 = 2048*1536*4 So, if the images is a 4-channel image, the resolution is 2048 x 1536 which is in standard 4:3 aspect ratio.
It turns out that Cityscapes model only takes a specific size: The width should be twice the length.
If you know Python well, you will see that ValueError is internal code error. It has nothing to do with missing dependencies or environment.
It has to do with the fact that the image was one total size first, and then it's being casted to array and then back into another dimensions.
That is not something that can be fixed or should be fixed by tempering with input data, but with addressing the bug in the provided library itself.
It is very common to have this kind of restrictions with NN classifier. Because once layers are trained, they can't be changed and the input must be very specific. Of course, it still can be "cooked" before giving it to NN but it's usually nondestructive/basic scaling, so the proportions must be preserved, which is what the library does wrong.

Memory Error while using the Binary Opening operation in Sci-Kit Image package for granulometry

I get a memory error when I am using the opening operation in the scikit-image package (it saturates my RAM). This memory error occurs for a 3-D structuring element which is a sphere/ball of radius 16 or larger. I am trying to use granulometry to measure the size distribution of objects in the image (3D array), so I need structuring elements of increasing radii. The memory requirements also increase exponentially and I cannot find a way around it. Is there a simple solution to this problem so that I can use structuring elements of even greater radii? The image size is 200X200X200. TIA
Traceback (most recent call last):
File "R3.py", line 124, in <module>
output_image = skimage.morphology.binary_opening(image, ball)
File "/usr/lib/python2.7/dist-packages/skimage/morphology/binary.py", line 117, in binary_opening
eroded = binary_erosion(image, selem)
File "/usr/lib/python2.7/dist-packages/skimage/morphology/binary.py", line 41, in binary_erosion
ndimage.convolve(binary, selem, mode='constant', cval=1, output=conv)
File "/usr/lib/python2.7/dist-packages/scipy/ndimage/filters.py", line 696, in convolve
origin, True)
File "/usr/lib/python2.7/dist-packages/scipy/ndimage/filters.py", line 544, in _correlate_or_convolve
_nd_image.correlate(input, weights, output, mode, cval, origins)
MemoryError
A volume of dimensions 200x200x200 is pretty small. A granulometry is made of sequential openings, so you just need 2 more volumes for the computation: one temporary between the erosion and the dilation, and one more for the final results. Which means three volumes total. And the structuring element should be a list of coordinates, so nothing too big.
Consequently, there is absolutely not reason you cannot perform a granulometry on your computer for a volume of such dimensions. The only explanation for the exponential memory use would be that the intermediate results are not erased.

How to fit multiple sequences with GMMHMM?

I have a problem with the Python hmmlearn library. This is that I have several training sets and I would like to have one Gaussian mixture hmm model to fit them.
Here is an example working with multiple sequences.
X = np.concatenate([X1, X2])
lengths = [len(X1), len(X2)]
hmm.GaussianHMM(n_components=3).fit(X, lengths)
When I change GaussianHMM to GMMHMM, it returns the following error:
hmm.GMMHMM(n_components=3).fit(X, lengths)
Traceback (most recent call last):
File "C:\Users\Cody\workspace\QuickSilver_HMT\hmm_list_sqlite.py", line 141, in hmm_list_pickle
hmm.GMMHMM(n_components=3).fit(X, lengths)
File "build\bdist.win32\egg\hmmlearn\hmm.py", line 998, in fit
raise ValueError("'lengths' argument is not supported yet")
ValueError: 'lengths' argument is not supported yet
How can one fit multiple sequences with GMMHMM?
The current master version contains a re-write of GMMHMM which did not support multiple sequences at some point. Now it does, so updating should help, as #ppasler suggested.
The re-write is still a work-in-progress. Please report any issues you encounter on the hmmlearn issue tracker.

CountVectorizer() in scikit-learn Python gives Memory error when feeding big Dataset. Same code with Smaller dataset works fine, what am I missing?

I am Working on Two Class Machine Learning Problem. Training Set contains 2-Millions Rows of URL(Strings) and Label 0 and 1. Classifier LogisticRegression() should predict any of two labels when testing datasets are passed. I am getting 95% accuracy results when i use smaller dataset i.e 78,000 URL and 0 and 1 as labels.
The Problem I am having is When I feed in big dataset (2 million row of URL strings) I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/Slim/.xy/startups/start/chi2-94.85 - Copy.py", line 48, in <module>
bi_counts = bi.fit_transform(url_list)
File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", line 780, in fit_transform
vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", line 717, in _count_vocab
j_indices.append(vocabulary[feature])
MemoryError
My code which is working for small datasets with fair enough accuracy is
bi = CountVectorizer(ngram_range=(3, 3),binary = True, max_features=9000, analyzer='char_wb')
bi_counts = bi.fit_transform(url_list)
tf = TfidfTransformer(norm='l2')
X_train_tf =tf.fit_transform(use_idf=True, bi_counts)
clf = LogisticRegression(penalty='l1',intercept_scaling=0.5,random_state=True)
clf.fit(train_x2,y)
I tried to keep 'max_features' as minimum as possible say max_features=100, but still same result.
Please Note:
I am Using core i5 with 4GB ram
I tried the same code on 8GB ram but
no luck
I am using Pyhon 2.7.6 with sklearn, NumPy 1.8.1, SciPy 0.14.0, Matplotlib 1.3.1
UPDATE:
#Andreas Mueller suggested to used HashingVectorizer(), i used it with small and large datasets, 78,000 dataset compiled successfully but 2-million dataset gave me same memory error as shown above. I tried it on 8GB ram and in-use memory space = 30% when compiling big dataset.
IIRC the max_features is only applied after the whole dictionary is computed.
The easiest way out is to use the HashingVectorizer that does not compute a dictionary.
You will lose the ability to get the corresponding token for a feature, but you shouldn't run into memory issues any more.

Categories

Resources