I have a function that returns a numpy array every second , that i want to store in another array for reference. for eg (array_a is returned)
array_a = [[ 25. 50. 25. 25. 50. ]
[ 1. 1. 1. 1. 1. ]]
array_collect = np.append(array_a,array_collect)
But when i Print array_collect , i get an added array, not a bigger array with arrays inside it.
array_collect = [ 25. 50. 25. 25. 50.
1. 1. 1. 1. 1.
25. 50. 25. 25. 50.
1. 1. 1. 1. 1.
25. 50. 25. 25. 50. ]
what i want is
array_collect = [ [[ 25. 50. 25. 25. 50. ]
[1. 1. 1. 1. 1. ]]
[[ 25. 50. 25. 25. 50. ]
[1. 1. 1. 1. 1. ]]
[[ 25. 50. 25. 25. 50. ]
[1. 1. 1. 1. 1. ]] ]
How do i get it ??
You could use vstack:
array_collect = np.array([[25.,50.,25.,25.,50.],[1.,1.,1.,1.,1.]])
array_a = np.array([[2.,5.,2.,2.,5.],[1.,1.,1.,1.,1.]])
array_collect=np.vstack((array_collect,array_a))
However, if you know the total number of minutes in advance, it would be better to define your array first (e.g. using zeros) and gradually fill it - this way, it is easier to stay within memory limits.
no_minutes = 5 #say 5 minutes
array_collect = np.zeros((no_minutes,array_a.shape[0],array_a.shape[1]))
Then, for every minute, m
array_collect[m] = array_a
Just use np.concatenate() and reshape this way:
import numpy as np
array_collect = np.array([[25.,50.,25.,25.,50.],[1.,1.,1.,1.,1.]])
array_a = np.array([[2.,5.,2.,2.,5.],[1.,1.,1.,1.,1.]])
array_collect = np.concatenate((array_collect,array_a),axis=0).reshape(2,2,5)
>>
[[[ 25. 50. 25. 25. 50.]
[ 1. 1. 1. 1. 1.]]
[[ 2. 5. 2. 2. 5.]
[ 1. 1. 1. 1. 1.]]]
I found it , this can be done by using :
np.reshape()
the new array formed can be reshaped using
y= np.reshape(y,(a,b,c))
where a is the no. of arrays stores and (b,c) is the shape of the original array
Related
I have the following input:
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19.]
Expected output:
[ 0. 0. 0. 0. 0. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
0. 0. 0. 0. 0.]
Current code:
from numpy import linspace
input_list = linspace(0,20,20, endpoint = False)
input_list[:5] = 0
input_list[15:] = 0
print(input_list)
I'm wondering if there are more elegant/pythonic ways of doing it?
I mean, you could do this if you just wanted that range.
list(range(5,15))
Or if you want to ignore the first few:
[0]*5+input[5:15]+[0]*5
Or if it's conditionnal
[x if 4<x<15 else 0 for x in input ]
Try a list inclusion:
l1 = [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19.]
l2 = [x for x in l1 if x in range(5, 15) else 0.]
I have a bunch of points and need to select a subset of them, add a value to the x coordinates and store the information in the original points.
I need to do it without loops or intermediate assignments.
import numpy as np
points=np.array([[100. , 100. , 100. ],
[ 0. , -2.75, 0. ],
[ 0. , -2.75, 5. ],
[ 0. , -1.9 , 3.15],
[ 0. , -1.9 , 3.35]])
then trying:
points[[3,4,0]][:,[0]]+=2
or
points[[3,4,0]][:,[0]]=points[[3,4,0]][:,[0]]+2
the original points variable does not change.
Any ideas? I suspect I am missing some stupid stuff...
If you are looking to edit first column of those rows use:
points[[3,4,0], 0] += 2
points
#[[ 102. 100. 100. ]
# [ 0. -2.75 0. ]
# [ 0. -2.75 5. ]
# [ 2. -1.9 3.15]
# [ 2. -1.9 3.35]]
I have a 2d array a and 2d array b. I need to calculate c =a/b,
so there is some inf or NaN objects. How can I check it with numpy and set them to np.nan?
Here is my code:
import numpy as np
a=np.asarray([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]])
b=np.asarray([[1,2,0,4,5],[1,2,0,4,5],[1,2,0,4,5],[1,2,3,4,5]])
c=a/b
b=np.where(isinstance(c, float),np.nan,c)
I am not sure, correct me if I am wrong, You are referring the inf objects in c i.e after calculating c = a/b.
Following is the sample code:
import numpy as np
np.seterr(divide='ignore', invalid='ignore') #To avoid RuntimeWarning: divide by zero encountered in true_divide after removing the cwd from sys.path.
a=np.asarray([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]])
b=np.asarray([[1,2,0,4,5],[1,2,0,4,5],[1,2,0,4,5],[1,2,3,4,5]])
c=a/b
print(c)
[[ 1. 1. inf 1. 1.]
[ 1. 1. inf 1. 1.]
[ 1. 1. inf 1. 1.]
[ 1. 1. 1. 1. 1.]]
c[np.isinf(c)] = np.nan #Finds inf object and replace with nan.
print(c)
[[ 1. 1. nan 1. 1.]
[ 1. 1. nan 1. 1.]
[ 1. 1. nan 1. 1.]
[ 1. 1. 1. 1. 1.]]
Hope it helps!
Attached the jupyter notebook screenshot for reference:
I have a pandas dataframe like this, where each ID is an observation with variables attr1, attr2 and attr3:
ID attr1 attr2 attr3
20 2 1 2
10 1 3 1
5 2 2 4
7 1 2 1
16 1 2 3
28 1 1 3
35 1 1 1
40 1 2 3
46 1 2 3
21 3 1 3
and made a similarity matrix I want to use where the IDs are compared based on the sum of the pairwise attribute differences.
[[ 0. 4. 3. 3. 3. 2. 2. 3. 3. 2.]
[ 4. 0. 5. 1. 3. 4. 2. 3. 3. 6.]
[ 3. 5. 0. 4. 2. 3. 5. 2. 2. 3.]
[ 3. 1. 4. 0. 2. 3. 1. 2. 2. 5.]
[ 3. 3. 2. 2. 0. 1. 3. 0. 0. 3.]
[ 2. 4. 3. 3. 1. 0. 2. 1. 1. 2.]
[ 2. 2. 5. 1. 3. 2. 0. 3. 3. 4.]
[ 3. 3. 2. 2. 0. 1. 3. 0. 0. 3.]
[ 3. 3. 2. 2. 0. 1. 3. 0. 0. 3.]
[ 2. 6. 3. 5. 3. 2. 4. 3. 3. 0.]]
I tried DBSCAN from sklearn for clustering the data, but it seems only the clusters themselves are labeled? I want to find the ID for the data points in the visualization later. So I only want to cluster the difference between the IDs, but not the IDs themselves. Is there another algorithm better for this kind of data, or a way I can label the distance matrix values so it can be used with the DBSCAN or another method?
ps.the dataset has over 50 attributes and 10000 observations
The labels_ attribute will give you an array of labels for each of your data points from training. The first index of that array is the label of your first training data point and so on.
I'm using Theano 0.7 to create a convolutional neural net which uses max-pooling (i.e. shrinking a matrix down by keeping only the local maxima).
In order to "undo" or "reverse" the max-pooling step, one method is to store the locations of the maxima as auxiliary data, then simply recreate the un-pooled data by making a big array of zeros and using those auxiliary locations to place the maxima in their appropriate locations.
Here's how I'm currently doing it:
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
# HERE is the function that I hope could be improved
def upsamplecode(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
for whichitem in range(minibatchsize):
for whichfilt in range(numfilters):
upsampled = T.set_subtensor(upsampled[whichitem, whichfilt, auxpos[whichitem, whichfilt, :]], encoded[whichitem, whichfilt, :])
return upsampled
totalitems = minibatchsize * numfilters * numsamples
code = theano.shared(np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor # arbitrary positions within a bin
auxpos += (np.arange(4) * 5).reshape((1,1,-1)) # shifted to the actual temporal bin location
auxpos = theano.shared(auxpos.astype(np.int))
print "code:"
print code.get_value()
print "locations:"
print auxpos.get_value()
get_upsampled = theano.function([], upsamplecode(code, auxpos))
print "the un-pooled data:"
print get_upsampled()
(By the way, in this case I have a 3D tensor, and it's only the third axis that gets max-pooled. People who work with image data might expect to see two dimensions getting max-pooled.)
The output is:
code:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
locations:
[[[ 0 6 12 18]
[ 4 5 11 17]
[ 3 9 10 16]]
[[ 2 8 14 15]
[ 1 7 13 19]
[ 0 6 12 18]]]
the un-pooled data:
[[[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 2. 0.
0. 0. 0. 0. 3. 0.]
[ 0. 0. 0. 0. 4. 5. 0. 0. 0. 0. 0. 6. 0. 0.
0. 0. 0. 7. 0. 0.]
[ 0. 0. 0. 8. 0. 0. 0. 0. 0. 9. 10. 0. 0. 0.
0. 0. 11. 0. 0. 0.]]
[[ 0. 0. 12. 0. 0. 0. 0. 0. 13. 0. 0. 0. 0. 0.
14. 15. 0. 0. 0. 0.]
[ 0. 16. 0. 0. 0. 0. 0. 17. 0. 0. 0. 0. 0. 18.
0. 0. 0. 0. 0. 19.]
[ 20. 0. 0. 0. 0. 0. 21. 0. 0. 0. 0. 0. 22. 0.
0. 0. 0. 0. 23. 0.]]]
This method works but it's a bottleneck, taking most of my computer's time (I think the set_subtensor calls might imply cpu<->gpu data copying). So: can this be implemented more efficiently?
I suspect there's a way to express this as a single set_subtensor() call which may be faster, but I don't see how to get the tensor indexing to broadcast properly.
UPDATE: I thought of a way of doing it in one call, by working on the flattened tensors:
def upsamplecode2(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
add_to_flattened_indices = theano.shared(np.array([ [[(y + z * numfilters) * numsamples * upsampfactor for x in range(numsamples)] for y in range(numfilters)] for z in range(minibatchsize)], dtype=theano.config.floatX).flatten(), name="add_to_flattened_indices")
upsampled = T.set_subtensor(upsampled.flatten()[T.cast(auxpos.flatten() + add_to_flattened_indices, 'int32')], encoded.flatten()).reshape(upsampled.shape)
return upsampled
get_upsampled2 = theano.function([], upsamplecode2(code, auxpos))
print "the un-pooled data v2:"
ups2 = get_upsampled2()
print ups2
However, this is still not good efficiency-wise because when I run this (added on to the end of the above script) I find out that the Cuda libraries can't currently do the integer index manipulation efficiently:
ERROR (theano.gof.opt): Optimization failure due to: local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 1493, in process_node
replacements = lopt.transform(node)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/opt.py", line 952, in local_gpu_advanced_incsubtensor1
gpu_y = gpu_from_host(y)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 507, in __call__
node = self.make_node(*inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/basic_ops.py", line 133, in make_node
dtype=x.dtype)()])
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/type.py", line 69, in __init__
(self.__class__.__name__, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None
I don't know whether this is faster, but it may be a little more concise. See if it is useful for your case.
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
totalitems = minibatchsize * numfilters * numsamples
code = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor
auxpos += (np.arange(4) * 5).reshape((1,1,-1))
# first in numpy
shp = code.shape
upsampled_np = np.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled_np[np.arange(shp[0]).reshape(-1, 1, 1), np.arange(shp[1]).reshape(1, -1, 1), auxpos] = code
print "numpy output:"
print upsampled_np
# now the same idea in theano
encoded = T.tensor3()
positions = T.tensor3(dtype='int64')
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled = T.set_subtensor(upsampled[T.arange(shp[0]).reshape((-1, 1, 1)), T.arange(shp[1]).reshape((1, -1, 1)), positions], encoded)
print "theano output:"
print upsampled.eval({encoded: code, positions: auxpos})