I have a data set which has some categorical columns. Here is a small sample:
Temp precip dow tod
-20.44 snow 4 14.5
-22.69 snow 4 15.216666666666667
-21.52 snow 4 17.316666666666666
-21.52 snow 4 17.733333333333334
-20.51 snow 4 18.15
Here, the dow and precip are categorical, where as the others are continuous.
Is there a way I can create a OneHotEncoder for just those columns? I don't want to use pd.get_dummies because that won't put the data in the proper format unless of each dow and precip are in the new data.
Two things you could check out: sklearn-pandas and as mentioned by #Grr pipelines with this good intro.
So I prefer pipelines, as they are a tidy way, allow easy use with things like grid-seach, avoid leakage between folds in cross validation, etc. So I usually end up having a pipe like that (given you have precip LabelEncoded first):
from sklearn.pipeline import Pipeline, FeatureUnion, make_pipeline, make_union
from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression
class Columns(BaseEstimator, TransformerMixin):
def __init__(self, names=None):
self.names = names
def fit(self, X, y=None, **fit_params):
return self
def transform(self, X):
return X[self.names]
class Normalize(BaseEstimator, TransformerMixin):
def __init__(self, func=None, func_param={}):
self.func = func
self.func_param = func_param
def transform(self, X):
if self.func != None:
return self.func(X, **self.func_param)
else:
return X
def fit(self, X, y=None, **fit_params):
return self
cat_cols = ['precip', 'dow']
num_cols = ['Temp','tod']
pipe = Pipeline([
("features", FeatureUnion([
('numeric', make_pipeline(Columns(names=num_cols),Normalize())),
('categorical', make_pipeline(Columns(names=cat_cols),OneHotEncoder(sparse=False)))
])),
('model', LinearRegression())
])
The short answer is yes, but with some caveats.
First off you won't be able to use OneHotEncoder directly on the precip feature. You will need to encode those labels in to integers with LabelEncoder.
Secondly, if you just want to encode those features you can pass the proper values to the n_values and categorical_features parameters.
Example:
I will assume dow is day of the week, which will have seven values, and precip will have (rain, sleet, snow, and mix) as values.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
df2 = df.copy()
le = LabelEncoder()
le.fit(['rain', 'sleet', 'snow', 'mix'])
df2.precip = le.transform(df2.precip)
df2
Temp precip dow tod
0 -20.44 3 4 14.500000
1 -22.69 3 4 15.216667
2 -21.52 3 4 17.316667
3 -21.52 3 4 17.733333
4 -20.51 3 4 18.150000
# Initialize OneHotEncoder with 4 values for precip and 7 for dow.
ohe = OneHotEncoder(n_values=np.array([4,7]), categorical_features=[1,2])
X = ohe.fit_transform(df2)
X.toarray()
array([[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.44 , 14.5 ],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -22.69 ,
15.21666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.31666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.73333333],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.51 , 18.15 ]])
Ok that works, but you have to either mutate your data in place or create a copy an things can get a little messy. A more organized way to do this would be to use a Pipeline.
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import FeatureUnion, Pipeline
def get_precip(X):
le = LabelEncoder()
le.fit(['rain', 'sleet', 'snow', 'mix'])
return le.transform(X.precip).reshape(-1,1)
def get_dow(X):
return X.dow.values.reshape(-1,1)
def get_rest(X):
return X.drop(['precip', 'dow'], axis=1)
precip_trans = FunctionTransformer(get_precip, validate=False)
dow_trans = FunctionTransformer(get_dow, validate=False)
rest_trans = FunctionTransformer(get_rest, validate=False)
union = FeatureUnion([('precip', precip_trans), ('dow', dow_trans), ('rest', rest_trans)])
ohe = OneHotEncoder(n_values=[4,7], categorical_features=[0,1])
pipe = Pipeline([('union', union), ('one_hot', ohe)])
X = pipe.fit_transform(df)
X.toarray()
array([[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.44 , 14.5 ],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -22.69 ,
15.21666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.31666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.73333333],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.51 , 18.15 ]])
I do want to point out that in the upcoming release of sklearn v0.20 there will be a CategoricalEncoder which should make this kind of thing even easier.
I don't want to use pd.get_dummies because that won't put the data in
the proper format unless of each dow and precip are in the new data.
Assuming you want to encode but also maintain those two columns--are you sure this wouldn't work for you?
df = pd.DataFrame({
'temp': np.random.random(5) + 20.,
'precip': pd.Categorical(['snow', 'snow', 'rain', 'none', 'rain']),
'dow': pd.Categorical([4, 4, 4, 3, 1]),
'tod': np.random.random(5) + 10.
})
pd.concat((df[['dow', 'precip']],
pd.get_dummies(df, columns=['dow', 'precip'], drop_first=True)),
axis=1)
dow precip temp tod dow_3 dow_4 precip_rain precip_snow
0 4 snow 20.7019 10.4610 0 1 0 1
1 4 snow 20.0917 10.0174 0 1 0 1
2 4 rain 20.3978 10.5766 0 1 1 0
3 3 none 20.9804 10.0770 1 0 0 0
4 1 rain 20.3121 10.3584 0 0 1 0
In the case where you'll be interacting with new data that includes categories that df hasn't "seen," you can use
df['col'] = df['col'].cat.add_categories(...)
Where you pass a list of the set difference. This adds to the list of "recognized" categories for the resulting pd.Categorical object.
Related
i have a 2D numpy array. I'm trying to compute the similarities between rows and put it into a similarities array. Is this possible without loop? Thanks for your time!
# ratings.shape = (943, 1682)
arri = np.zeros(943)
arri = np.where(arri == 0)[0]
arrj = np.zeros(943)
arrj = np.where(arrj ==0)[0]
similarities = np.zeros((ratings.shape[0], ratings.shape[0]))
similarities[arri, arrj] = np.abs(ratings[arri]-ratings[arrj])
I want to make a 2D-array similarities in that similarities[i, j] is the differentiation between row i and row j in ratings
[ValueError: shape mismatch: value array of shape (943,1682) could not be broadcast to indexing result of shape (943,)]
[1][1]: https://i.stack.imgur.com/gtst9.png
The problem is how numpy iterates through the array when indexing a two-dimentional array with two arrays.
First some setup:
import numpy;
ratings = numpy.arange(1, 6)
indicesX = numpy.indices((ratings.shape[0],1))[0]
indicesY = numpy.indices((ratings.shape[0],1))[0]
ratings: [1 2 3 4 5]
indicesX: [[0][1][2][3][4]]
indicesY: [[0][1][2][3][4]]
Now lets see what your program produces:
similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))
similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[0])
similarities:
[[0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 2. 0. 0.]
[0. 0. 0. 3. 0.]
[0. 0. 0. 0. 4.]]
As you can see, numpy iterates over similarities basically like the following:
for i in range(5):
similarities[indicesX[i], indicesY[i]] = numpy.abs(ratings[i]-ratings[0])
similarities:
[[0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 2. 0. 0.]
[0. 0. 0. 3. 0.]
[0. 0. 0. 0. 4.]]
Now instead we need indices like the following to iterate through the entire array:
indecesX = [0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4]
indecesY = [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4]
We do that the following:
# Reshape indicesX from (x,1) to (x,). Thats important for numpy.tile().
indicesX = indicesX.reshape(indicesX.shape[0])
indicesX = numpy.tile(indicesX, ratings.shape[0])
indicesY = numpy.repeat(indicesY, ratings.shape[0])
indicesX: [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4]
indicesY: [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4]
Perfect! Now just call similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY]) again and we see:
similarities:
[[0. 1. 2. 3. 4.]
[1. 0. 1. 2. 3.]
[2. 1. 0. 1. 2.]
[3. 2. 1. 0. 1.]
[4. 3. 2. 1. 0.]]
Here the whole code again:
import numpy;
ratings = numpy.arange(1, 6)
indicesX = numpy.indices((ratings.shape[0],1))[0]
indicesY = numpy.indices((ratings.shape[0],1))[0]
similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))
indicesX = indicesX.reshape(indicesX.shape[0])
indicesX = numpy.tile(indicesX, ratings.shape[0])
indicesY = numpy.repeat(indicesY, ratings.shape[0])
similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY])
print(similarities)
PS
You commented on your own post to improve it. You should edit your question instead of commenting on it, when you want to improve it.
Im trying to get to understand how to use numpy for calculating a formula for different times. The way the code is written gives all the values where y is bigger than 0. I am experimenting how to get the values for all y's.
Is there someone who can explain me the part: ft = t * [y >= 0.0 ]. How do i use the parts within the brackets?
from numpy import *
g = 10.0
h0 = 10.0
t = arange(0, 10.1 ,0.1)
y = h0 - 0.5*g*t*t
ft = t * [y >= 0.0 ]
print(ft)
This is the output, but I would like to see all the values calculated. So i experimented a bit but i could not figure it out how to do it and how the; [y >= 0.0] part exactly works.
[[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]
If i use [y] instead of [y >= 0.0] i get the following:
[[ 0.000000e+00 9.950000e-01 1.960000e+00 2.865000e+00 3.680000e+00
4.375000e+00 4.920000e+00 5.285000e+00 5.440000e+00 5.355000e+00
5.000000e+00 4.345000e+00 3.360000e+00 2.015000e+00 2.800000e-01
-1.875000e+00 -4.480000e+00 -7.565000e+00 -1.116000e+01 -1.529500e+01
-2.000000e+01 -2.530500e+01 -3.124000e+01 -3.783500e+01 -4.512000e+01
-5.312500e+01 -6.188000e+01 -7.141500e+01 -8.176000e+01 -9.294500e+01
-1.050000e+02 -1.179550e+02 -1.318400e+02 -1.466850e+02 -1.625200e+02
-1.793750e+02 -1.972800e+02 -2.162650e+02 -2.363600e+02 -2.575950e+02
-2.800000e+02 -3.036050e+02 -3.284400e+02 -3.545350e+02 -3.819200e+02
-4.106250e+02 -4.406800e+02 -4.721150e+02 -5.049600e+02 -5.392450e+02
-5.750000e+02 -6.122550e+02 -6.510400e+02 -6.913850e+02 -7.333200e+02
-7.768750e+02 -8.220800e+02 -8.689650e+02 -9.175600e+02 -9.678950e+02
-1.020000e+03 -1.073905e+03 -1.129640e+03 -1.187235e+03 -1.246720e+03
-1.308125e+03 -1.371480e+03 -1.436815e+03 -1.504160e+03 -1.573545e+03
-1.645000e+03 -1.718555e+03 -1.794240e+03 -1.872085e+03 -1.952120e+03
-2.034375e+03 -2.118880e+03 -2.205665e+03 -2.294760e+03 -2.386195e+03
-2.480000e+03 -2.576205e+03 -2.674840e+03 -2.775935e+03 -2.879520e+03
-2.985625e+03 -3.094280e+03 -3.205515e+03 -3.319360e+03 -3.435845e+03
-3.555000e+03 -3.676855e+03 -3.801440e+03 -3.928785e+03 -4.058920e+03
-4.191875e+03 -4.327680e+03 -4.466365e+03 -4.607960e+03 -4.752495e+03
-4.900000e+03]]
I would like to know how i can use numpy to calculate at once all the outcomes of a formula for different time intervals.
Thanks,
y >= 0.0 gives you an array of Booleans which contain True/False depending on the fulfillment of the condition y >= 0.0. When you enclose it within [] as [y >= 0.0], you get a list which contains a single array of Booleans, as pointed out by #nicola in the comments below.
[array([ True, True, True, True, True, False, False, False,...
... False, False, False, False])]
Now you multiply this with your arange array which will give you 0 when the right hand side of * operator is False and will give you the actual value from the arange when the right hand side of * operator is True
The array [y >= 0.0] produces and array of booleans. i.e. 1 if y>=0 and 0 if not. That array of 1's and 0's is then multiplied by t.
It is not clear to me from your question however, what you are trying to do with it.
I have a bunch of points and need to select a subset of them, add a value to the x coordinates and store the information in the original points.
I need to do it without loops or intermediate assignments.
import numpy as np
points=np.array([[100. , 100. , 100. ],
[ 0. , -2.75, 0. ],
[ 0. , -2.75, 5. ],
[ 0. , -1.9 , 3.15],
[ 0. , -1.9 , 3.35]])
then trying:
points[[3,4,0]][:,[0]]+=2
or
points[[3,4,0]][:,[0]]=points[[3,4,0]][:,[0]]+2
the original points variable does not change.
Any ideas? I suspect I am missing some stupid stuff...
If you are looking to edit first column of those rows use:
points[[3,4,0], 0] += 2
points
#[[ 102. 100. 100. ]
# [ 0. -2.75 0. ]
# [ 0. -2.75 5. ]
# [ 2. -1.9 3.15]
# [ 2. -1.9 3.35]]
My question is about multi-threading in python. The problem I am working on is finding 80 percent similar arrays(lengths are 64) to a given array with the same length from 10 Million arrays. The problem is that although my code executes in 12.148 seconds when I iterate linearly inside a while loop, it doesn't execute in at least 28-30 seconds when I use multi threading. Both implementations are below. Any advice appreciated and please enlighten me, why does it make it slower to multi thread in this case?
First code:
import timeit
import numpy as np
ph = np.load('newDataPhoto.npy')
myPhoto1 = np.array([ 1. , 1. , 0. , 1. , 0. , 0. , 1. , 0. , 1. , 0. , 0. , 1. , 0. , 1. , 1. , 1. , 0. , 0.
, 0. , 1. , 1. , 0. , 1. , 1. , 0. , 0. , 1. , 1. , 1. , 0. , 0. , 1. , 0. , 0. , 1. , 1. , 1. , 0. , 0. , 1. , 0. , 0. , 1. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 1. , 0. , 0. , 1.
, 1. , 0. , 1. , 0. , 1. , 0. , 0. , 1. , 1. , 0. ])
start = timeit.default_timer()
kk=0
i=0
while i< 10000000:
u = np.count_nonzero(ph[i] != myPhoto1)
if u <= 14:
kk+=1
i+=1
print(kk)
stop = timeit.default_timer()
print stop-start
Second one(multi-threaded):
from threading import Thread
import numpy as np
import timeit
start = timeit.default_timer()
ph = np.load('newDataPhoto.npy')
pc = np.load('newDataPopCount.npy')
myPhoto1 = np.array([ 1. , 1. , 0. , 1. , 0. , 0. , 1. , 0. , 1. , 0. , 0. , 1. , 0. , 1. , 1. , 1. , 0. , 0.
, 0. , 1. , 1. , 0. , 1. , 1. , 0. , 0. , 1. , 1. , 1. , 0. , 0. , 1. , 0. , 0. , 1. , 1. , 1. , 0. , 0. , 1. , 0. , 0. , 1. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 1. , 0. , 0. , 1.
, 1. , 0. , 1. , 0. , 1. , 0. , 0. , 1. , 1. , 0. ])
def hamming_dist(left, right, name):
global kk
start = timeit.default_timer()
while left<=right:
if(np.count_nonzero(ph[left] != myPhoto1)<=14):
kk+=1
left+=1
stop=timeit.default_timer()
print name
print stop-start
def Main():
global kk
kk=0
t1 = Thread(target=hamming_dist, args=(0,2500000, 't1'))
t2 = Thread(target=hamming_dist, args=(2500001, 5000000, 't2'))
t3 = Thread(target=hamming_dist, args=(5000001, 7500000,'t3'))
t4 = Thread(target=hamming_dist, args=(7500001, 9999999, 't4'))
t1.start()
t2.start()
t3.start()
t4.start()
print ('main done')
if __name__ == "__main__":
Main()
And their outputs in order:
38
12.148679018
#####
main done
t4
26.4695241451
t2
27.4959039688
t3
27.5113890171
t1
27.5896160603
I solved the problem. I found out that threading is blocked by GIL which never allows to use more that the current processor. However using multiprocessing module worked. Here is the modification I made:
import numpy as np
import multiprocessing
import timeit
start = timeit.default_timer()
ph = np.load('newDataPhoto.npy')
pc = np.load('newDataPopCount.npy')
myPhoto1 = np.array([ 1. , 1. , 0. , 1. , 0. , 0. , 1. , 0. , 1. , 0. , 0. , 1. , 0. , 1. , 1. , 1. , 0. , 0.
, 0. , 1. , 1. , 0. , 1. , 1. , 0. , 0. , 1. , 1. , 1. , 0. , 0. , 1. , 0. , 0. , 1. , 1. , 1. , 0. , 0. , 1. , 0. , 0. , 1. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 1. , 0. , 0. , 1.
, 1. , 0. , 1. , 0. , 1. , 0. , 0. , 1. , 1. , 0. ])
def hamming_dist(left, right, name):
global kk
start = timeit.default_timer()
while left<=right:
if(np.count_nonzero(ph[left] != myPhoto1)<=14):
kk+=1
left+=1
stop=timeit.default_timer()
print name
print stop-start
def Main():
global kk
kk=0
t1 = multiprocessing.Process(target=hamming_dist, args=(0,2500000, 't1'))
t2 = multiprocessing.Process(target=hamming_dist, args=(2500001, 5000000, 't2'))
t3 = multiprocessing.Process(target=hamming_dist, args=(5000001, 7500000,'t3'))
t4 = multiprocessing.Process(target=hamming_dist, args=(7500001, 9999999, 't4'))
t1.start()
t2.start()
t3.start()
t4.start()
print ('main done')
if __name__ == "__main__":
Main()
I'm using Theano 0.7 to create a convolutional neural net which uses max-pooling (i.e. shrinking a matrix down by keeping only the local maxima).
In order to "undo" or "reverse" the max-pooling step, one method is to store the locations of the maxima as auxiliary data, then simply recreate the un-pooled data by making a big array of zeros and using those auxiliary locations to place the maxima in their appropriate locations.
Here's how I'm currently doing it:
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
# HERE is the function that I hope could be improved
def upsamplecode(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
for whichitem in range(minibatchsize):
for whichfilt in range(numfilters):
upsampled = T.set_subtensor(upsampled[whichitem, whichfilt, auxpos[whichitem, whichfilt, :]], encoded[whichitem, whichfilt, :])
return upsampled
totalitems = minibatchsize * numfilters * numsamples
code = theano.shared(np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor # arbitrary positions within a bin
auxpos += (np.arange(4) * 5).reshape((1,1,-1)) # shifted to the actual temporal bin location
auxpos = theano.shared(auxpos.astype(np.int))
print "code:"
print code.get_value()
print "locations:"
print auxpos.get_value()
get_upsampled = theano.function([], upsamplecode(code, auxpos))
print "the un-pooled data:"
print get_upsampled()
(By the way, in this case I have a 3D tensor, and it's only the third axis that gets max-pooled. People who work with image data might expect to see two dimensions getting max-pooled.)
The output is:
code:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
locations:
[[[ 0 6 12 18]
[ 4 5 11 17]
[ 3 9 10 16]]
[[ 2 8 14 15]
[ 1 7 13 19]
[ 0 6 12 18]]]
the un-pooled data:
[[[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 2. 0.
0. 0. 0. 0. 3. 0.]
[ 0. 0. 0. 0. 4. 5. 0. 0. 0. 0. 0. 6. 0. 0.
0. 0. 0. 7. 0. 0.]
[ 0. 0. 0. 8. 0. 0. 0. 0. 0. 9. 10. 0. 0. 0.
0. 0. 11. 0. 0. 0.]]
[[ 0. 0. 12. 0. 0. 0. 0. 0. 13. 0. 0. 0. 0. 0.
14. 15. 0. 0. 0. 0.]
[ 0. 16. 0. 0. 0. 0. 0. 17. 0. 0. 0. 0. 0. 18.
0. 0. 0. 0. 0. 19.]
[ 20. 0. 0. 0. 0. 0. 21. 0. 0. 0. 0. 0. 22. 0.
0. 0. 0. 0. 23. 0.]]]
This method works but it's a bottleneck, taking most of my computer's time (I think the set_subtensor calls might imply cpu<->gpu data copying). So: can this be implemented more efficiently?
I suspect there's a way to express this as a single set_subtensor() call which may be faster, but I don't see how to get the tensor indexing to broadcast properly.
UPDATE: I thought of a way of doing it in one call, by working on the flattened tensors:
def upsamplecode2(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
add_to_flattened_indices = theano.shared(np.array([ [[(y + z * numfilters) * numsamples * upsampfactor for x in range(numsamples)] for y in range(numfilters)] for z in range(minibatchsize)], dtype=theano.config.floatX).flatten(), name="add_to_flattened_indices")
upsampled = T.set_subtensor(upsampled.flatten()[T.cast(auxpos.flatten() + add_to_flattened_indices, 'int32')], encoded.flatten()).reshape(upsampled.shape)
return upsampled
get_upsampled2 = theano.function([], upsamplecode2(code, auxpos))
print "the un-pooled data v2:"
ups2 = get_upsampled2()
print ups2
However, this is still not good efficiency-wise because when I run this (added on to the end of the above script) I find out that the Cuda libraries can't currently do the integer index manipulation efficiently:
ERROR (theano.gof.opt): Optimization failure due to: local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 1493, in process_node
replacements = lopt.transform(node)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/opt.py", line 952, in local_gpu_advanced_incsubtensor1
gpu_y = gpu_from_host(y)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 507, in __call__
node = self.make_node(*inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/basic_ops.py", line 133, in make_node
dtype=x.dtype)()])
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/type.py", line 69, in __init__
(self.__class__.__name__, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None
I don't know whether this is faster, but it may be a little more concise. See if it is useful for your case.
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
totalitems = minibatchsize * numfilters * numsamples
code = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor
auxpos += (np.arange(4) * 5).reshape((1,1,-1))
# first in numpy
shp = code.shape
upsampled_np = np.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled_np[np.arange(shp[0]).reshape(-1, 1, 1), np.arange(shp[1]).reshape(1, -1, 1), auxpos] = code
print "numpy output:"
print upsampled_np
# now the same idea in theano
encoded = T.tensor3()
positions = T.tensor3(dtype='int64')
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled = T.set_subtensor(upsampled[T.arange(shp[0]).reshape((-1, 1, 1)), T.arange(shp[1]).reshape((1, -1, 1)), positions], encoded)
print "theano output:"
print upsampled.eval({encoded: code, positions: auxpos})