I have a function foo that takes a NxM numpy array as an argument and returns a scalar value. I have a AxNxM numpy array data, over which I'd like to map foo to give me a resultant numpy array of length A.
Curently, I'm doing this:
result = numpy.array([foo(x) for x in data])
It works, but it seems like I'm not taking advantage of the numpy magic (and speed). Is there a better way?
I've looked at numpy.vectorize, and numpy.apply_along_axis, but neither works for a function of 2D arrays.
EDIT: I'm doing boosted regression on 24x24 image patches, so my AxNxM is something like 1000x24x24. What I called foo above applies a Haar-like feature to a patch (so, not terribly computationally intensive).
If NxM is big (say, 100), they the cost of iterating over A will be amortized into basically nothing.
Say the array is 1000 X 100 X 100.
Iterating is O(1000), but the cumulative cost of the inside function is O(1000 X 100 X 100) - 10,000 times slower. (Note, my terminology is a bit wonky, but I do know what I'm talking about)
I'm not sure, but you could try this:
result = numpy.empty(data.shape[0])
for i in range(len(data)):
result[i] = foo(data[i])
You would save a big of memory allocation on building the list ... but the loop overhead would be greater.
Or you could write a parallel version of the loop, and split it across multiple processes. That could be a lot faster, depending on how intensive foo is (as it would have to offset the data handling).
You can achieve that by reshaping your 3D array as a 2D array with the same leading dimension, and wrap your function foo with a function that works on 1D arrays by reshaping them as required by foo. An example (using trace instead of foo):
from numpy import *
def apply2d_along_first(func2d, arr3d):
a, n, m = arr3d.shape
def func1d(arr1d):
return func2d(arr1d.reshape((n,m)))
arr2d = arr3d.reshape((a,n*m))
return apply_along_axis(func1d, -1, arr2d)
A, N, M = 3, 4, 5
data = arange(A*N*M).reshape((A,N,M))
print data
print apply2d_along_first(trace, data)
Output:
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]]
[[40 41 42 43 44]
[45 46 47 48 49]
[50 51 52 53 54]
[55 56 57 58 59]]]
[ 36 116 196]
Related
I would like to know if there exists a similar way of doing this (Mathematica) in Python:
Mathematica
I have tried it in Python and it does not work. I have also tried it with numpy.put() or with simple 2 for loops. This 2 ways work properly but I find them very time consuming with larger matrices (3000×3000 elements for example).
Described problem in Python,
import numpy as np
a = np.arange(0, 25, 1).reshape(5, 5)
b = np.arange(100, 500, 100).reshape(2, 2)
p = np.array([0, 3])
a[p][:, p] = b
which outputs non-changed matrix a: Python
Perhaps you are looking for this:
a[p[...,None], p] = b
Array a after the above assignment looks like this:
[[100 1 2 200 4]
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[300 16 17 400 19]
[ 20 21 22 23 24]]
As documented in Integer Array Indexing, the two integer index arrays will be broadcasted together, and iterated together, which effectively indexes the locations a[0,0], a[0,3], a[3,0], and a[3,3]. The assignment statement would then perform an element-wise assignment at these locations of a, using the respective element-values from RHS.
I was recently wondering how I could by-pass the following numpy behavior.
Starting with an simple example:
import numpy as np
a = np.array([[1,2,3,4,5,6,7,8,9,0], [11, 12, 13, 14, 15, 16, 17, 18, 19, 10]])
then:
b = a.copy()
b[:, [0,1,4,8]] = b[:, [0,1,4,8]] + 50
print(b)
...results in printing:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
but also taking one index double into the slice then:
c = a.copy()
c[:, [0,1,4,4,8]] = c[:, [0,1,4,4,8]] + 50
print(c)
giving:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
(in short; they do the same thing)
Could I also have that for index 4 it is executed 2 times?
Or more practically; Let the slice element i be given r times: Can we let the above expression be applied r times, instead of numpy just taking it once into account? Also if we replace "50" by something that differs for every occurance of i?
For my current code, I used:
w[p1] = w[p1] + D[pix]
where I define "pix", "p1" as some numpy arrays with dtype int, same length and some integers may appear multiple times.
(So one may have pix = [..., 1,1,1,2,2,3,...] at the same time as p1 = [..., 21,32,13,23,11,78,...], however, thus resulting on its own into taking for index 1 only the first 1 and the corresponding 21 and scraping the rest of the ones.)
Of course using a for loop would solve the problem easily. The point is that both the integers and the sizes of the arrays are huge, so it would cost a lot of computational resources to use for-loops instead of efficient numpy-array routines. Any ideas, links to existing documentation etc.?
I am doing some machine learning stuff in python/numpy in which I want to index a 2-dimensional ndarray with a 1-D ndarray, so that I get a 1-D array with the indexed values.
I got it to work with some ugly piece of code and I would like to know if there is a better way, because this just seems unnatural for such a nice language and module combination as python+numpy.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print(a)
print(b)
print(a[b, np.arange(0, a.shape[1]).reshape(1,a.shape[1])])
#Prints:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]]
[9 7 5 3 1]
[[45 36 27 18 9]]
This is exactly what I want(even though technically a 2-D ndarray), but this seems very complicated. Is there a neater and tidier way?
Edit:
To clarify, I actually I do not want a 1-D array, that was very poorly explained. I actually do want one dimension with length 1, because that is what I need for processing it later, but this would be easily achieved with a reshape() statement. Sorry for the confusion, I just mixed my actual code with the more general question.
You want a 1D array, yet you included a reshape call whose only purpose is to take the array from the format you want to a format you don't want.
Stop reshaping the arange output. Also, you don't need to specify the 0 start value explicitly:
result = a[b, np.arange(a.shape[1])]
You can just use np.diagonal to get what you want. No need of reshape or indexing. The tricky part here was to identify the pattern which you want to obtain which is basically the diagonal elements of a[b] matrix.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print (np.diagonal(a[b]))
# [45 36 27 18 9]
As #user2357112 mentioned in the comments, the return of np.diagonal is read only. In my opinion, it would be a problem if you plan to append/modify the values to this final desired list. If you just want to print them or use them for some further indexing, it should be fine.
As per the docs
Starting in NumPy 1.9 it returns a read-only view on the original array. Attempting to write to the resulting array will produce an error.
In some future release, it will return a read/write view and writing to the returned array will alter your original array. The returned array will have the same type as the input array.
If you don’t write to the array returned by this function, then you can just ignore all of the above.
Can you clarify what the [:, :5] part of the code does in the following code segment?
for i in range(5):
weights = None
test_inputs = testset[i][:, :5]
test_inputs = test_inputs.astype(np.float32)
test_answer = testset[i][:, :5]
test_answer = code_answer(test_answer)
this is explained in the numpy indexing guide of the manual. this is not standard python syntax.
if you have an array a, a[:] returns a view (not a copy; assigning to this will change a) on the whole array; a[:5] a view on elements 0, 1, ..., 4.
numpy allows more-dimensional arrays to be indexed with a[i, j] instead of the pure python version a[i][j].
this should cover all your cases.
This code probably uses numpy arrays.
Numpy defines more elaborate array slicing, similar to Matlab style.
The [:, :5] means that from you array (probably a 2D array, returned from tester[i]) you take all the rows (designated by :) and then it takes columns from the beginning until but not including column 5.
Each part of the slicing expression follows regular Python slicing syntax.
The [:, :5] itself is actually interpreted like this [(:, :5)], since in Python a comma separated values without parenthesis are interpreted as a tuple.
The array object can handle tuples that denote complex slicing patterns.
This is in short the meaning of this syntax.
For more information, you may want to visit the numpy page, you can start at www.scipy.org.
As mentioned by others, the array being referenced is from the
numpy package (see it's home page)
An example should help. First create a multi-dimensional array structure similar to what the code provided is manipulating:
from numpy import *
a=arange(36).reshape(6,6)
b=a.reshape(6,3,2).swapaxes(0,2)
print b
[[[0 6 12 18 24 30]
[2 8 14 20 26 32]
[4 10 16 22 28 34]]
[[1 7 13 19 25 31]
[3 9 15 21 27 33]
[5 11 17 23 29 35]]]
The [:,:5] syntax is an array slicing mechanism that culls all
component array entries beyond the fifth:
print b[1][:,:5]
[[1 7 13 19 25]
[3 9 15 21 27]
[5 11 17 23 29]]
I couldn't get my 4 arrays of year, day of year, hour, and minute to concatenate the way I wanted, so I decided to test several variations on shorter arrays than my data.
I found that it worked using method "t" from my test code:
import numpy as np
a=np.array([[1, 2, 3, 4, 5, 6]])
b=np.array([[11, 12, 13, 14, 15, 16]])
c=np.array([[21, 22, 23, 24, 25, 26]])
d=np.array([[31, 32, 33, 34, 35, 36]])
print a
print b
print c
print d
q=np.concatenate((a, b, c, d), axis=0)
#concatenation along 1st axis
print q
t=np.concatenate((a.T, b.T, c.T, d.T), axis=1)
#transpose each array before concatenation along 2nd axis
print t
x=np.concatenate((a, b, c, d), axis=1)
#concatenation along 2nd axis
print x
But when I tried this with the larger arrays it behaved the same as method "q".
I found an alternative approach of using vstack over here that did what I wanted, but I am trying to figure out why concatenation sometimes works for this, but not always.
Thanks for any insights.
Also, here are the outputs of the code:
q:
[[ 1 2 3 4 5 6]
[11 12 13 14 15 16]
[21 22 23 24 25 26]
[31 32 33 34 35 36]]
t:
[[ 1 11 21 31]
[ 2 12 22 32]
[ 3 13 23 33]
[ 4 14 24 34]
[ 5 15 25 35]
[ 6 16 26 36]]
x:
[[ 1 2 3 4 5 6 11 12 13 14 15 16 21 22 23 24 25 26 31 32 33 34 35 36]]
EDIT: I added method t to the end of a section of the code that was already fixed with vstack, so you can compare how vstack will work with this data but not concatenate. Again, to clarify, I found a workaround already, but I don't know why the concatenate method doesn't seem to be consistent.
Here is the code:
import numpy as np
BAO10m=np.genfromtxt('BAO_010_2015176.dat', delimiter=",", usecols=range(0-6), dtype=[('h', int), ('year', int), ('day', int), ('time', int), ('temp', float)])
#10 meter weather readings at BAO tower site for June 25, 2015
hourBAO=BAO10m['time']/100
minuteBAO=BAO10m['time']%100
#print hourBAO
#print minuteBAO
#time arrays
dayBAO=BAO10m['day']
yearBAO=BAO10m['year']
#date arrays
datetimeBAO=np.vstack((yearBAO, dayBAO, hourBAO, minuteBAO))
#t=np.concatenate((a.T, b.T, c.T, d.T), axis=1) <this gave desired results in simple tests
#not working for this data, use vstack instead, with transposition after stack
print datetimeBAO
test=np.transpose(datetimeBAO)
#rotate array
print test
#this prints something that can be used for datetime
t=np.concatenate((yearBAO.T, dayBAO.T, hourBAO.T, minuteBAO.T), axis=1)
print t
#this prints a 1D array of all the year values, then all the day values, etc...
#but this method worked for shorter 1D arrays
The file I used can be found at this site.