Printing and reading Numpy arrays efficiently - python

I would like to print a Numpy array and then read it back. This is what I have done so far:
#printer
import numpy as np
N = 100
x = np.arange(N)
for xi in x:
print(xi)
#reader
import numpy as np
N = 100
x = np.empty(N)
for i in range(N):
x[i] = float(input())
This gets the job done but I think that it may not be the most
efficient way due to the multiple uses of input(). An alternative way I considered is printing only once, reading only once and modifying what I read. This approach has some similarities with this question. In contrast to that question, I have some extra info that could possibly be used to improve performance:
N is known in advance(to both programs)
Arrays are only 1D or 2D(of sizes N and NxN respectively)
Data are float
Data are fully trusted
Thanks in advance.
Edit: I have to add that the value of N will not be that large, even N=1000 will be huge for my problem.

Related

Fastest way to store 3D numpy array in a loop

I need to store a numpy array of shape (2000,720,1280) which is created in every loop. My code looks like:
U_list = []
for N_f in range(N):
U = somefunction(N_f)
U_list.append(U)
del U
So I delete the matrix U in every loop because my RAM get full.
Is this a good method to store the matrix U or would you recommend another solution? I compare the code to matlab and matlab need the half time to compute. I think the storage of U in a list could be the reason.
Using this method will tell you if you are able to store the total U arrays right out the gate. If N is so large that you can't make the results numpy array, you'll have to get creative. Maybe save every 20 into a pickle file or something.
import numpy as np
N = 20
shape = (2000, 720, 1280)
#Make sure to match the dtype returned by somefunction
results = np.zeros((N, *shape))
for N_f in range(N):
results[N_f] = somefunction(N_f)

How to efficiently operate a large numpy array

I have a segment of codes which is based on a large numpy array and then to operate another array. Because this is a very large array, could you please let me know whether there is an efficient way to achieve my goal? (I think the efficient way should be achieved by directly operating on the array but not through the for-loop).
Thanks in advance, please below find my codes:
N = 1000000000
rand = np.random.rand(N)
beta = np.zeros(N)
for i in range(0, N):
if rand[i] < 0.5:
beta[i] = 2.0*rand[i]
else:
beta[i] = 1.0/(2.0*(1.0-rand[i]))
You are here basically losing the efficiency of numpy, by performing the processing in Python. The idea of numpy is process the items in bulk, since it has efficiency algorithms in C++ behind the curtains that do the actual processing. You can see the Python end of numpy more as an "interface".
Now to answer your question, we can basically first construct an array of random numbers between 0 and 2, by multiplying it with 2 already:
rand = 2.0 * np.random.rand(N)
next we can use np.where(..) [numpy-doc] that acts like a conditional selector: we here pass it three "arrays": the first one is an array of booleans that encodes the truthiness of the "condition", the second is the array of values to fill in in case the related condition is true, and the third value is the array of values to plug in in case the condition is false, so we can then write it like:
N = 1000000000
rand = 2 * np.random.rand(N)
beta = np.where(rand < 1.0, rand, 1.0 / (2.0 - rand))
N = 1000000000 caused a MemoryError for me. Reducing to 100 for a minimal example.
You can use np.where routine.
In both case, fundamentally you are iterating over your array and applying a function. However, np.where uses a way faster loop (it's compiled code basically), while your "python" loop is interpreted and thus really slow for a big N.
Here's an example of implementation.
N = 100
rand = np.random.rand(N)
beta = np.where(rand < 0.5, 2.0 * rand, 1.0/(2.0*(1.0-rand))
As other answers have pointed out, iterating over the elements of a numpy array in a Python loop should (and can) almost always be avoided. In most cases going from a Python loop to an array operation gives a speedup of ~100x.
However, if performance is absolutely critical, you can often squeeze out another factor of between 2x and 10x (in my experience) by using Cython.
Here's an example:
%%cython
cimport numpy as np
import numpy as np
cimport cython
from cython cimport floating
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.cdivision(True)
cpdef np.ndarray[floating, ndim=1] beta(np.ndarray[floating, ndim=1] arr):
cdef:
Py_ssize_t i
Py_ssize_t N = arr.shape[0]
np.ndarray[floating, ndim=1] result = np.zeros(N)
for i in range(N):
if arr[i] < 0.5:
result[i] = 2.0*arr[i]
else:
result[i] = 1.0/(2.0*(1.0-arr[i]))
return result
You would then call it as beta(rand).
As you can see, this allows you to use your original loop structure, but now using efficient typed native code. I get a speedup of ~2.5x compared to np.where.
It should be noted that in many cases this is not worth the extra effort compared to the one-liner in numpy -- but it may well be where performance is critical.

python numpy arrays. How to slice multiple arrays in an efficient way?

i got a problem to solve and i cannot come up with a good solution.
To ease it down I got an array of 10x10 and i want to slice out "little arrays" of 3x3. Right now i do this the following way:
array = np.arange(100).reshape((10,10))
patch = np.array(array[:3, :3]
for n in range(3, 10, 3):
for m in range(3, 10, 3):
patch = numpy.append(patch, array[n:n+3, m:m+3]
i basically create the numpy array patch with the first slice and append all other slices afterwards. The problem with this is that its horribly slow and does not do good use of the slicing opportunities of numpy. I need to do this for a high number of much bigger arrays.
Can anyone give me any advice on how to make this more efficient?
1000 thanks!
Your problem is entirely down to using numpy.append. append creates a new array each time you use it. As your patch array gets bigger this will take progressively longer.
Instead, use a presized array (you already know the final size of the patch array), and avoid making intermediary copies of any data.
# setup
x, y = 999, 999
array = np.arange(x * y)
array.shape = x, y
little_array_size = 3
# creates an array of "little arrays"
patch = np.empty(array.size, dtype=int)
patch.shape = -1, little_array_size, little_array_size
i = 0
for n in range(0, array.shape[0], little_array_size):
for m in range(0, array.shape[1], little_array_size):
# uses view, so data is copied straight from array to patch
patch[i,:] = array[n:n+little_array_size, m:m+little_array_size]
i += 1
patch.shape = -1 # flattens array
The above takes about a third of second on my computer (two orders of magnitude faster than using numpy.append (20+ seconds)).

Huge sparse matrix in python

I need to iteratively construct a huge sparse matrix in numpy/scipy. The intitialization is done within a loop:
from scipy.sparse import dok_matrix, csr_matrix
def foo(*args):
dim_x = 256*256*1024
dim_y = 128*128*512
matrix = dok_matrix((dim_x, dim_y))
for i in range(dim_x):
# compute stuff in order to get j
matrix[i, j] = 1.
return matrix.tocsr()
Then i need to convert it to a csr_matrix, because of further computations like:
matrix = foo(...)
result = matrix.T.dot(x)
At the beginning this was working fine. But my matrices are getting bigger and bigger and my computer starts to crash. Is there a more elegant way in storing the matrix?
Basically i have the following requirements:
The matrix needs to store float values form 0. to 1.
I need to compute the transpose of the matrix
I need to compute the dot product with a x_dimensional vector
The matrix dimensions can be around 1*10^9 x 1*10^8
My ram-storage is exceeding. I was reading several posts on stack overflow and the rest of the internet ;) I found PyTables, which isn't really made for matrix computations... etc.. Is there a better way?
For your case I would recommend using the data type np.int8 (or np.uint8) which require only one byte per element:
matrix = dok_matrix((dim_x, dim_y), dtype=np.int8)
Directly constructing the csr_matrix will also allow you to go further with the maximum matrix size:
from scipy.sparse import csr_matrix
def foo(*args):
dim_x = 256*256*1024
dim_y = 128*128*512
row = []
col = []
for i in range(dim_x):
# compute stuff in order to get j
row.append(i)
col.append(j)
data = np.ones_like(row, dtype=np.int8)
return csr_matrix((data, (row, col)), shape=(dim_x, dim_y), dtype=np.int8)
You may have hit the limits of what Python can do for you, or you may be able to do a little more. Try setting a datatype of np.float32, if you're on a 64 bit machine, this reduced precision may reduce your memory consumption. np.float16 may help you on memory even further, but your calculations may slow down (I've seen examples where processing may take 10x the amount of time):
matrix = dok_matrix((dim_x, dim_y), dtype=np.float32)
or possibly much slower, but even less memory consumption:
matrix = dok_matrix((dim_x, dim_y), dtype=np.float16)
Another option: buy more system memory.
Finally, if you can avoid creating your matrix with dok_matrix, and can create it instead with csr_matrix (I don't know if this is possible for your calculations) you may save a little overhead on the dict that dok_matrix uses.

fast dot product on all pair of rows

I have a 2d numpy array X = (xrows, xcols) and I want to apply dot product on each row combination of the array to obtain another array which is of the shape P = (xrow, xrow).
The code looks like the following:
P = np.zeros((xrow, xrow))
for i in range(xrow):
for j in range(xrow):
P[i, j] = numpy.dot(X[i], X[j])
which works well if the array X is small but takes a lot of time for huge X. Is there any way to make it faster or do it more pythonically so that it is fast?
That is obtained by doing result = X.dot(X.T)
When the array becomes large, it can be done be blocks, but depending on your numpy backend this should already parallelize threadwise as much as possible. It seems that this is what you are looking for.
If for some reason you don't want to rely on that, and finally do resort to multiprocessing, you can try something along the lines of
import numpy as np
X = np.random.randn(1000, 100000)
block_size = 10000
from sklearn.externals.joblib import Parallel, delayed
products = Parallel(n_jobs=10)(delayed(np.dot)(X[:, pos:pos + block_size], X.T[pos:pos + block_size]) for pos in range(0, X.shape[1], block_size))
product = np.sum(products, axis=0)
I don't think this is useful for relatively small arrays. And threading can sometimes take care of this better as well.
This is 10% faster on my machine as it avoids loops:
numpy.matrix(X) * numpy.matrix(X.T)
but still there is 50% redundancy.

Categories

Resources