Numpy random memory error - python

Hi I have a simple line that creates a random array for a rather large dataset:
import numpy as np
import random
N=276233
L=138116
np.random.random([L,N])
But i get this error:
Traceback (most recent call last):
File "<string>", line 3 (23), in <module>
File "mtrand.pyx", line 760, in mtrand.RandomState.random_sample (numpy\random\mtrand\mtrand.c:5713)
File "mtrand.pyx", line 137, in mtrand.cont0_array (numpy\random\mtrand\mtrand.c:1300)
MemoryError
What is the solution and what is the limit of the array ?

You are trying to create an array that would require 284GB of memory:
In [16]: L * N * 8 / (1024. ** 3)
Out[16]: 284.25601890683174
Either buy a lot more RAM (and make sure your system can handle it) or find a way to not have to generate a 276,233x138,116 matrix.

Related

Save complex numpy array as image using scikit-image

Get following error when try I to use io.imsave("image.jpg",array)
Traceback (most recent call last):
File "Fourer.py", line 37, in <module>
io.imsave( "test.jpg", fImage2)
File "C:\ProgramData\Miniconda3\lib\site-packages\skimage\io\_io.py", line 131, in imsave
if is_low_contrast(arr):
File "C:\ProgramData\Miniconda3\lib\site-packages\skimage\exposure\exposure.py", line 503, in is_low_contrast
dlimits = dtype_limits(image, clip_negative=False)
File "C:\ProgramData\Miniconda3\lib\site-packages\skimage\util\dtype.py", line 49, in dtype_limits
imin, imax = dtype_range[image.dtype.type]
KeyError: <class 'numpy.complex128'>
it's a 2n complex array I use
array = [[ 3.25000000e+02+0.00000000e+00j -1.25000000e+01+1.72047740e+01j
-1.25000000e+01+4.06149620e+00j -1.25000000e+01-4.06149620e+00j
-1.25000000e+01-1.72047740e+01j]
[-6.25000000e+01+8.60238700e+01j -8.88178420e-16+8.88178420e-16j
0.00000000e+00+1.29059879e-15j 0.00000000e+00+1.29059879e-15j
-8.88178420e-16-8.88178420e-16j]
[-6.25000000e+01+2.03074810e+01j -8.88178420e-16+4.44089210e-16j
-3.55271368e-15+5.46706420e-15j -3.55271368e-15+5.46706420e-15j
-8.88178420e-16-4.44089210e-16j]
[-6.25000000e+01-2.03074810e+01j -8.88178420e-16+4.44089210e-16j
-3.55271368e-15-5.46706420e-15j -3.55271368e-15-5.46706420e-15j
-8.88178420e-16-4.44089210e-16j]
[-6.25000000e+01-8.60238700e+01j -8.88178420e-16+8.88178420e-16j
0.00000000e+00-1.29059879e-15j 0.00000000e+00-1.29059879e-15j
-8.88178420e-16-8.88178420e-16j]]
How can i save i complex array as image?
If that matrix was obtained from the FFT of an image, then you first need to do Inverse FFT. Only then, you can save it using io.imsave.
If that is the case, take a look at skimage's:
----> Inverse Fourier Transform

How do I fix an argument error in an fft function that uses skcuda.cufft?

I want to make a python-wrapped GPU fft function that can compute the transforms of arbitrary sized inputs using scikits-cuda.cufft. (I tried PyFFT which only takes powers of 2)
I modeled my skcuda.cufft code from the CUDA code:
__host__ cuDoubleComplex* FFT(cuDoubleComplex *data, int NX){
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
cuDoubleComplex *d_data;
cudaMalloc((void **)&d_data,NX*sizeof(cuDoubleComplex));
cufftHandle plan;
cufftPlan1d(&plan,NX,CUFFT_Z2Z,1);
cudaMemcpy(d_data, data, NX*sizeof(cuDoubleComplex), cudaMemcpyHostToDevice);
cufftExecZ2Z(plan,d_data,d_data,CUFFT_FORWARD);
cudaMemcpy(data,d_data,NX*sizeof(cuDoubleComplex),cudaMemcpyDeviceToHost);
cufftDestroy(plan);
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime, start, stop);
printf("\n Elapsed Time: %3.1f ms\n", elapsedTime);
cudaFree(d_data);
return data;
}
and my skcuda.cufft code looks like:
import skcuda.cufft as ft
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy as np
N=100
x=np.array(np.random.random(N),np.float32)
x_gpu=gpuarray.to_gpu(x)
xf_gpu = gpuarray.empty(N,np.complex64)
plan=ft.cufftPlan1d(N,ft.CUFFT_Z2Z,1)
ft.cufftExecZ2Z(plan,x_gpu,xf_gpu,ft.CUFFT_FORWARD)
ft.cufftDestroy(plan)
xf=x_gpu.get()
but it gives the error:
runfile('/home/jesli/sk-cufft_test.py', wdir='/home/jesli') Traceback
(most recent call last):
File "", line 1, in
runfile('/home/jesli/sk-cufft_test.py', wdir='/home/jesli')
File
"/home/jesli/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py",
line 580, in runfile
execfile(filename, namespace)
File "/home/jesli/sk-cufft_test.py", line 19, in
ft.cufftExecZ2Z(plan,x_gpu,xf_gpu,ft.CUFFT_FORWARD)
File
"/home/jesli/anaconda/lib/python2.7/site-packages/skcuda/cufft.py",
line 319, in cufftExecZ2Z
direction)
ArgumentError: argument 2: : wrong type
The transform directions (CUFFT_FORWARD,CUFFT_INVERSE) are already defined in the source code.
http://scikit-cuda.readthedocs.org/en/latest/_modules/skcuda/cufft.html
I want to know what went wrong with the code, or what argument the function expects.

How to convert set to float or integer

I am interested in plotting the unique values in an integer vector u against the number of times each of those unique values occurs in u, (i.e. the frequency distribution of unique values occurring in u).
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer,word_tokenize
from nltk import FreqDist
import matplotlib
from matplotlib import pyplot as plt
txtwrds=state_union.words('2006-GWBush.txt')
vocab=set(w.lower() for w in txtwrds if w.isalpha())
vocab=nltk.Text(vocab)
fdist1=FreqDist(txtwrds)
u=[]
for w in vocab:
u.append(fdist1[w])
x=FreqDist(u)
y=set(u)
print(len(x),len(y)) #Gives same vector length for x and y
plt.scatter(x,y) #This is what throws the error
plt.show()
As you can see in the last few lines of code, in order to get a new vector y of the unique values in u I run "y=set(u)." And I assign "x=FreqDist(u)." So far so good. Problem comes when I try to plot x and y using matplotlib's "scatter." I get "TypeError: float() argument must be a string or a number, not 'set'"
The full traceback:
Traceback (most recent call last):
File "C:/Python34/first_program.py", line 45, in <module>
plt.scatter(x,y)
File "C:\Python34\lib\site-packages\matplotlib\pyplot.py", line 3200, in scatter
linewidths=linewidths, verts=verts, **kwargs)
File "C:\Python34\lib\site-packages\matplotlib\axes\_axes.py", line 3674, in scatter
self.add_collection(collection)
File "C:\Python34\lib\site-packages\matplotlib\axes\_base.py", line 1477, in add_collection
self.update_datalim(collection.get_datalim(self.transData))
File "C:\Python34\lib\site-packages\matplotlib\collections.py", line 192, in get_datalim
offsets = np.asanyarray(offsets, np.float_)
File "C:\Python34\lib\site-packages\numpy\core\numeric.py", line 525, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
TypeError: float() argument must be a string or a number, not 'set'
Attempts at converting y to integer or float (y=int(y),y=float(y)) meet with errors like this:
Traceback (most recent call last):
File "C:/Python34/first_program.py", line 44, in <module>
y=int(y)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'
FYI - I am using 32 bit python v3.4.3 on a Windows 7 64 bit machine. (There are some nltk bugs arising with 64 bit python v3.5, so have to use the earlier version.)
You can easily do this with pandas.DataFrame:
import pandas as pds
df = pds.DataFrame(data=[txtwords],columns=['word'])
df.reset_index(inplace=True) #just to have a column to count
df.groupby('word').count().plot()

Python: computing pariwise distances causes memory error

I want to compute the pairwise distances of 57832 vectors. Each vector has 200 dimensions. I am using pdist to compute the distances.
from scipy.spatial.distance import pdist
pairwise_distances = pdist(X, 'cosine')
# pdist is supposed to return a numpy array with shape (57832*57831,).
However, this causes a memory error.
Traceback (most recent call last):
File "/home/munichong/git/DomainClassification/NameSuggestion#Verisign/classification_DMOZ/main.py", line 101, in <module>
result_clustering = clf_clustering.getCVResult(shuffle)
File "/home/munichong/git/DomainClassification/NameSuggestion#Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 158, in getCVResult
self.centroids_of_categories(X_train, y_train)
File "/home/munichong/git/DomainClassification/NameSuggestion#Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 103, in centroids_of_categories
cat_centroids.append( self.dpc.centroids(X_in_this_cat) )
File "/home/munichong/git/DomainClassification/NameSuggestion#Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 23, in centroids
distance_dict, rho_dict = self.compute_distances_and_rhos(X)
File "/home/munichong/git/DomainClassification/NameSuggestion#Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 59, in compute_distances_and_rhos
pairwise_distances = pdist(X, 'cosine')
File "/usr/local/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 1185, in pdist
dm = np.zeros((m * (m - 1)) // 2, dtype=np.double)
MemoryError
The RAM of my laptop is 16GB. How should I fix it? Or is there any better way?
Doing matrix-based algorithms on large data sets is prohibitive.
The memory requirements are straightforward to estimate. Even with exploiting symmetry, many implementations will max out at about 65000 instances. But even 64 bit implementations and big machines will eventually give up. A 1000000x1000000 matrix with double precision and exploiting symmetry needs 4 TB of RAM.
Use better algorithms that don't need O(n^2) memory and runtime.

Having Issues with an AssertionError when trying to use the psd() command in matplotlib

I'm trying to write a short script that takes a .csv file with some distance data, and outputs the psd file for it. the code is here:
import math
import matplotlib.pyplot as plt
name = raw_input('File:')
data = open(name + '.csv', 'r')
distances = []
for row in data:
distances.append(row.replace("\n",""))
for i in range(len(distances)):
distances[i] = float(distances[i])
Pxx, freqs = plt.psd(distances, NFFT=16,Fs=2,detrend='detrend_mean',window='window_none',noverlap=128,sides='onesided',scale_by_freq=True)
plot(Pxx,freqs)
plt.savefig(name + 'psd.png', bbox_inches = 'tight')
As you can see, it's pretty simple. the csv file just features one column of numbers, so distances is a vector.
The error I'm getting is as follows:
Traceback (most recent call last):
File "C:psdplot.py", line 15, in <module>
Pxx, freqs = plt.psd(distances, NFFT=16,Fs=2,detrend='detrend_mean',window='window_none',noverlap=128,sides='onesided',scale_by_freq=True)
File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 3029, in psd
sides=sides, scale_by_freq=scale_by_freq, **kwargs)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 8696, in psd
sides, scale_by_freq)
File "C:\Python27\lib\site-packages\matplotlib\mlab.py", line 389, in psd
scale_by_freq)
File "C:\Python27\lib\site-packages\matplotlib\mlab.py", line 423, in csd
noverlap, pad_to, sides, scale_by_freq)
File "C:\Python27\lib\site-packages\matplotlib\mlab.py", line 251, in _spectral_helper
assert(len(window) == NFFT)
AssertionError
Could someone direct me on how to fix this? I'm sure it's rather obvious, but I haven't been able to find anything on fixing it in this particular context.
Thanks in advance!

Categories

Resources