Python sklearn ValueError: array is too big

Python sklearn ValueError: array is too big - python

I made simple script on Python (ver.3.7) that classifies satellite image, but It can classify only clip of the satellite image. When I'm trying to classify the whole satellite image, it returns this:
Traceback (most recent call last):
File "v0-3.py", line 219, in classification_tool
File "sklearn\cluster\k_means_.py", line 972, in fit
File "sklearn\cluster\k_means_.py", line 312, in k_means
File "sklearn\utils\validation.py", line 496, in check_array
File "numpy\core\_asarray.py", line 85, in asarray
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
I tried using MiniBatchKMeans instead of KMeans (from Sklearn.KMeans : how to avoid Memory or Value Error?), but It still doesn't work. How I can avoid or solve this error? Maybe there are some mistakes in my code?

Oh I'm idiot because I used x32 version of Python instead of x64.
Maybe reinstalling Python to x64 version will solve your problem, user

Related

SVM-handwriting-recognition-master

Good day everyone, I am currently working on a project to classify handwriting using SVM classifier, after downloading and resizing the dataset from NIST special database 19. on trying to train my model... I keep getting this error:
File "train_model.py", line 65, in <module>
x_pt = preprocessing.scale(x_pt)
File "C:\Users\Judson_Morgan\anaconda3\lib\site-packages\sklearn\preprocessing\_data.py", line 142, in scale
force_all_finite='allow-nan')
File "C:\Users\Judson_Morgan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 586, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 144)) while a minimum of 1 is required by the scale function.
How do I go about resolving this issue?

How to convert caffe model and weight to pytorch

Hello I need help converting weight and model of caffe to pytorch. I have tried using the github that most other post suggest to use this github but when I used it, I encounter a lot of problem since it used python 2 and currently I am using python 3. I have already tried to remove some layer that the github doesn't cover, manually change old syntax to the new syntax but the last error reference to nn module from pytorch and I have no idea to fix that.
Traceback (most recent call last):
File "caffe2pytorch.py", line 30, in <module>
pytorch_blobs, pytorch_models = forward_pytorch(protofile, weightfile)
File "caffe2pytorch.py", line 17, in forward_pytorch
net = caffenet.CaffeNet(protofile)
File "/home/cgal/reference/SfSNet/caffe2pytorch/caffenet.py", line 384, in __init__
self.add_module(name, model)
File "/home/cgal/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 186, in add_module
raise KeyError("module name can't contain \".\"")
KeyError: 'module name can\'t contain "."'
So is there any suggestion on how to convert caffe weight and model to pytorch?
This is the caffe model that I want to convert download here

Tensorflow Dataset API - .from_tensor_slices() / .from_tensor() - cannot create a tensor proto whose content is larger than 2gb

So I want to use Dataset API for batching my large dataset (~8GB) as I am suffering from large idle times when using my GPU as I am passing data from python to Tensorflow using feed_dict.
When I follow the tutorial as mentioned here:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/5_DataManagement/tensorflow_dataset_api.py
When running my simple code:
one_hot_dataset = np.load("one_hot_dataset.npy")
dataset = tf.data.Dataset.from_tensor_slices(one_hot_dataset)
I am getting the error message with TensorFlow 1.8 and Python 3.5:
Traceback (most recent call last):
File "<ipython-input-17-412a606c772f>", line 1, in <module>
dataset = tf.data.Dataset.from_tensor_slices((one_hot_dataset))
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 235, in from_tensor_slices
return TensorSliceDataset(tensors)
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1030, in __init__
for i, t in enumerate(nest.flatten(tensors))
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1030, in <listcomp>
for i, t in enumerate(nest.flatten(tensors))
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1014, in convert_to_tensor
as_ref=False)
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1104, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 214, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/anaconda2/envs/tf/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 496, in make_tensor_proto
"Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
How can I solve this? I think the cause is obvious but what did the tf developers think by limiting the input data to 2GB ?!? I really cannot understand this rational and what is the workaround when dealing with larger datasets?
I googled quite a lot but I could not find any similar error message. When I use a FITFH of the numpy dataset, the steps above work without any issues.
I somehow need to tell TensorFlow that I actually will be loading the data batch by batch and probably want to prefetch a few batches in order to keep my GPU busy. But it seems as if it is trying to load the whole numpy dataset at once. So what is the benefit of using the Dataset API, as I am able to reproduce this error by simply trying to load my numpy dataset as a tf.constant into the TensorFlow graph, which is obviously does not fit and I get OOM errors.
Tips and troubleshooting hints appreciated!

This issue is addressed in the tf.data user guide (https://www.tensorflow.org/guide/datasets) in "Consuming NumPy arrays" section.
Basically, create a dataset.make_initializable_iterator() iterator and feed your data at runtime.
If this does not work for some reason, you can write your data to files or create a dataset from Python generator (https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator), where you can put arbitrary Python code including slicing your numpy array and yielding the slice.

CountVectorizer() in scikit-learn Python gives Memory error when feeding big Dataset. Same code with Smaller dataset works fine, what am I missing?

I am Working on Two Class Machine Learning Problem. Training Set contains 2-Millions Rows of URL(Strings) and Label 0 and 1. Classifier LogisticRegression() should predict any of two labels when testing datasets are passed. I am getting 95% accuracy results when i use smaller dataset i.e 78,000 URL and 0 and 1 as labels.
The Problem I am having is When I feed in big dataset (2 million row of URL strings) I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/Slim/.xy/startups/start/chi2-94.85 - Copy.py", line 48, in <module>
bi_counts = bi.fit_transform(url_list)
File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", line 780, in fit_transform
vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
File "C:\Python27\lib\site-packages\sklearn\feature_extraction\text.py", line 717, in _count_vocab
j_indices.append(vocabulary[feature])
MemoryError
My code which is working for small datasets with fair enough accuracy is
bi = CountVectorizer(ngram_range=(3, 3),binary = True, max_features=9000, analyzer='char_wb')
bi_counts = bi.fit_transform(url_list)
tf = TfidfTransformer(norm='l2')
X_train_tf =tf.fit_transform(use_idf=True, bi_counts)
clf = LogisticRegression(penalty='l1',intercept_scaling=0.5,random_state=True)
clf.fit(train_x2,y)
I tried to keep 'max_features' as minimum as possible say max_features=100, but still same result.
Please Note:
I am Using core i5 with 4GB ram
I tried the same code on 8GB ram but
no luck
I am using Pyhon 2.7.6 with sklearn, NumPy 1.8.1, SciPy 0.14.0, Matplotlib 1.3.1
UPDATE:
#Andreas Mueller suggested to used HashingVectorizer(), i used it with small and large datasets, 78,000 dataset compiled successfully but 2-million dataset gave me same memory error as shown above. I tried it on 8GB ram and in-use memory space = 30% when compiling big dataset.

IIRC the max_features is only applied after the whole dictionary is computed.
The easiest way out is to use the HashingVectorizer that does not compute a dictionary.
You will lose the ability to get the corresponding token for a feature, but you shouldn't run into memory issues any more.

PIL (Image) ValueError: Not a valid number of quantization tables. Should be between 1 and 4

I want to draw a rectangle on a picture and save it as a new file. what I'm doing is below:
from PIL import Image
from PIL import ImageChops
from PIL import ImageDraw
im = Image.open('the animal picture.jpg')
draw = ImageDraw.Draw(im)
draw.rectangle((69, 17, 418, 107))
im = im.convert('RGB')
im.save('new.jpg')
It gives an error message:
Traceback (most recent call last):
File "C:\Python27\draw_re.py", line 9, in <module>
im.save('new.jpg')
File "C:\Python27\lib\PIL\Image.py", line 1439, in save
save_handler(self, fp, filename)
File "C:\Python27\lib\PIL\JpegImagePlugin.py", line 471, in _save
ImageFile._save(im, fp, [("jpeg", (0,0)+im.size, 0, rawmode)])
File "C:\Python27\lib\PIL\ImageFile.py", line 494, in _save
for e, b, o, a in tile:
ValueError: Not a valid number of quantization tables. Should be between 1 and 4.
It looks like the problem in PIL - Not a valid numbers of quantization tables. Should be between 2 and 4, but the tip doesn't solve the problem. It makes batch processing impossible.

I worked it out. The problem caused by the Image and PIL libraries I am using.
I uninstalled and removed all previous installed PIL and Image libraries (there were confusion before and difficulties in original installations) so I have cross files and folders for the libraries.
I did the uninstallations through pip, and "Control Panel\All Control Panel Items\Programs and Features" in Windows as well. Also have manually removed the residues folders and files.
Pillow is the one shall be used. I downloaded a MS Windows installer from https://pypi.python.org/pypi/Pillow/2.6.1 and installed it. Run the script and it's working fine.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python sklearn ValueError: array is too big - python

Oh I'm idiot because I used x32 version of Python instead of x64. Maybe reinstalling Python to x64 version will solve your problem, user

Related

SVM-handwriting-recognition-master

How to convert caffe model and weight to pytorch

Tensorflow Dataset API - .from_tensor_slices() / .from_tensor() - cannot create a tensor proto whose content is larger than 2gb

CountVectorizer() in scikit-learn Python gives Memory error when feeding big Dataset. Same code with Smaller dataset works fine, what am I missing?

PIL (Image) ValueError: Not a valid number of quantization tables. Should be between 1 and 4

Categories

Resources