How to calculate formula for every value in an array? - python

Im trying to get to understand how to use numpy for calculating a formula for different times. The way the code is written gives all the values where y is bigger than 0. I am experimenting how to get the values for all y's.
Is there someone who can explain me the part: ft = t * [y >= 0.0 ]. How do i use the parts within the brackets?
from numpy import *
g = 10.0
h0 = 10.0
t = arange(0, 10.1 ,0.1)
y = h0 - 0.5*g*t*t
ft = t * [y >= 0.0 ]
print(ft)
This is the output, but I would like to see all the values calculated. So i experimented a bit but i could not figure it out how to do it and how the; [y >= 0.0] part exactly works.
[[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]
If i use [y] instead of [y >= 0.0] i get the following:
[[ 0.000000e+00 9.950000e-01 1.960000e+00 2.865000e+00 3.680000e+00
4.375000e+00 4.920000e+00 5.285000e+00 5.440000e+00 5.355000e+00
5.000000e+00 4.345000e+00 3.360000e+00 2.015000e+00 2.800000e-01
-1.875000e+00 -4.480000e+00 -7.565000e+00 -1.116000e+01 -1.529500e+01
-2.000000e+01 -2.530500e+01 -3.124000e+01 -3.783500e+01 -4.512000e+01
-5.312500e+01 -6.188000e+01 -7.141500e+01 -8.176000e+01 -9.294500e+01
-1.050000e+02 -1.179550e+02 -1.318400e+02 -1.466850e+02 -1.625200e+02
-1.793750e+02 -1.972800e+02 -2.162650e+02 -2.363600e+02 -2.575950e+02
-2.800000e+02 -3.036050e+02 -3.284400e+02 -3.545350e+02 -3.819200e+02
-4.106250e+02 -4.406800e+02 -4.721150e+02 -5.049600e+02 -5.392450e+02
-5.750000e+02 -6.122550e+02 -6.510400e+02 -6.913850e+02 -7.333200e+02
-7.768750e+02 -8.220800e+02 -8.689650e+02 -9.175600e+02 -9.678950e+02
-1.020000e+03 -1.073905e+03 -1.129640e+03 -1.187235e+03 -1.246720e+03
-1.308125e+03 -1.371480e+03 -1.436815e+03 -1.504160e+03 -1.573545e+03
-1.645000e+03 -1.718555e+03 -1.794240e+03 -1.872085e+03 -1.952120e+03
-2.034375e+03 -2.118880e+03 -2.205665e+03 -2.294760e+03 -2.386195e+03
-2.480000e+03 -2.576205e+03 -2.674840e+03 -2.775935e+03 -2.879520e+03
-2.985625e+03 -3.094280e+03 -3.205515e+03 -3.319360e+03 -3.435845e+03
-3.555000e+03 -3.676855e+03 -3.801440e+03 -3.928785e+03 -4.058920e+03
-4.191875e+03 -4.327680e+03 -4.466365e+03 -4.607960e+03 -4.752495e+03
-4.900000e+03]]
I would like to know how i can use numpy to calculate at once all the outcomes of a formula for different time intervals.
Thanks,

y >= 0.0 gives you an array of Booleans which contain True/False depending on the fulfillment of the condition y >= 0.0. When you enclose it within [] as [y >= 0.0], you get a list which contains a single array of Booleans, as pointed out by #nicola in the comments below.
[array([ True, True, True, True, True, False, False, False,...
... False, False, False, False])]
Now you multiply this with your arange array which will give you 0 when the right hand side of * operator is False and will give you the actual value from the arange when the right hand side of * operator is True

The array [y >= 0.0] produces and array of booleans. i.e. 1 if y>=0 and 0 if not. That array of 1's and 0's is then multiplied by t.
It is not clear to me from your question however, what you are trying to do with it.

Related

Best way to calculate similarity between rows of a matrix

I'm a Python noob, so sorry if the question sounds too basic for anyone reading.
I've this matrix of type numpy.matrix (I omit all the code that generates this matrix):
[[0. 0.2342598 0. 0. 0. 0.31308172
0. 0. 0. 0. 0.31308172 0.
0. 0.86549525 0. ]
[0. 0.2342598 0. 0. 0. 0.31308172
0. 0. 0. 0. 0.31308172 0.
0. 0.86549525 0. ]
[0.22575551 0.72375361 0. 0.19345532 0.22575551 0.19345532
0. 0.38691064 0.19345532 0.19345532 0.19345532 0.19345532
0. 0. 0. ]
[0.22575551 0.72375361 0. 0.19345532 0.22575551 0.19345532
0. 0.38691064 0.19345532 0.19345532 0.19345532 0.19345532
0. 0. 0. ]
[0. 0.64936739 0. 0.28928716 0. 0.
0.39985833 0.28928716 0.28928716 0.28928716 0. 0.28928716
0. 0. 0. ]
[0.26302218 0.50593649 0.37991833 0.22539002 0.26302218 0.
0.31153847 0.11269501 0.11269501 0.45078005 0. 0.11269501
0.18995916 0. 0.18995916]]
By using sklearn.metrics.pairwise.cosine_similarity I easily get the similarity between two rows.
For example, by coding cosine_similarity(X[0], X[1]) I get a numpy.ndarray that contains only one element: a float value between 0.0 and 1.0 that represents the level of similarity between X[0] and X[1]. I finally get the value inside the array this way cosine_similarity(X[0], X[1])[0][0].item().
Problem is I don't need to compare only two rows for similarity: I need to compare X[0] to any other row and find the most similar to X[0].
What's the best (more pythonic, performant, elegant, practical...) way to do it?
Any help is appreciated.
Update: sorry, I forgot to mention what actually works for me:
def calculate():
h = 0.0
e = -1
for i in range(1, len(m)):
if cosine_similarity(m[0], m[i])[0][0].item() >= h:
h = cosine_similarity(m[0], m[i])[0][0].item()
e = i
return e

Is there a python function for assigning values to several elements in a list?

I used the code shown below to create a list of lists.
Code:
num = 782
sol=4
pop_size= [sol, num]
initial_population_1 = np.random.uniform(low=0.0, high=0.0, size=pop_size)
The list of lists is shown below:
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
How can I randomly assign five values that are greater than 0 but less than 10 to five elements in each list in the list?
Thank you very much!
So, you have a list of lists, specifically a list of 4 lists, each of them containing 782 elements, all 0.0, and you want to set 5 elements at random to 1.0.
I'd like to mention that, as you are using Numpy, there is np.zeros(shape) that provides you with a zero-filled array, but whatever…
From your question it's not clear if you want to avoid to use twice the same location, but let's assume that you want to assign a random value to exactly 5 entries in each row
for row in initial_population_1:
locations_used_in_this_row = 0
while locations_used_in_this_row != 5:
column = np.random.randint(num)
if row[column] == 0.0:
row[column] = np.random.rand()*10
locations_used_in_this_row += 1

How to create a rectangular grid with custom start point and step value

I'm working on a project where I need to calibrate to cameras. As you know one needs to define a plane grid points in the 3D-world and find their correspondences on the image plane. Therefore, the first camera has the following 3D_grid points:
mport cv2 as cv
import numpy as np
WPoints_cam1 = np.zeros((9*3,3), np.float64)
WPoints_cam1[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
print(WPoints_cam1)
[[0. 0. 0. ]# world coordinate center
[0.4 0. 0. ]
[0.8 0. 0. ]
[1.2 0. 0. ]
[1.6 0. 0. ]
[2. 0. 0. ]
[2.4 0. 0. ]
[2.8 0. 0. ]
[3.2 0. 0. ]
[0. 0.4 0. ]
[0.4 0.4 0. ]
[0.8 0.4 0. ]
[1.2 0.4 0. ]
[1.6 0.4 0. ]
[2. 0.4 0. ]
[2.4 0.4 0. ]
[2.8 0.4 0. ]
[3.2 0.4 0. ]
[0. 0.8 0. ]
[0.4 0.8 0. ]
[0.8 0.8 0. ]
[1.2 0.8 0. ]
[1.6 0.8 0. ]
[2. 0.8 0. ]
[2.4 0.8 0. ]
[2.8 0.8 0. ]
[3.2 0.8 0. ]]
As seen above the first grid (for the first camera) starts from the defined reference 3D_point (0,0,0) and ends by the point (3.2,0.8 0) with a constant offset of 0.4 and 9x3 dimension
Note that all Z coordinates were put to Z=0 (Zhengyou Zhang calibration)
Now my question is, as I need to define a second grid(for the second camera) that also refers to the defined 3D_coordinate center (0,0,0), I need to define a grid that starts from (3.6,0,0) and ends with (6.8,0.8,0) with the same offset 0.4 and has a dimension 9x3
I believe this is easy to do. However I can't think out of the box due to my beginner level of experience.
Would appreciate for some help and thanks in advance.
You can scale each column like this:
np.mgrid[0:8, 0:3].T.reshape(-1,2) * np.array([(7.8 - 3.6) / 7, 0.4]) + np.array([3.6, 0])
or combine it into scaling matrix like this (and then add on a vector for the translation)
np.mgrid[0:8, 0:3].T.reshape(-1,2) # np.array([[(7.8 - 3.6) / 7, 0], [0, 0.4]]).T + np.array([3.6, 0])
regarding where (7.8 - 3.6) / 7 comes from, the numerator should be self evident. The denominator is the same but for your original dimensions. With 0:8 the max is 7 and the min is 0 so the denominator becomes 7 - 0.

Odd behavior of using += with numpy.array and numpy.ma.array

Can anyone explain the following result to me?
I know it is not as one would usually do this operation, but I found this result odd.
import numpy as np
a = np.ma.masked_where(np.arange(20)>10,np.arange(20))
b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
c = np.zeros(a.shape)
d = np.zeros(a.shape)
c[~a.mask] += b[~a.mask]
print(b[~a.mask])
#masked_array(data=[--, --, --, --, --, --, --, --,--, --, --],
# mask=[ True, True, True, True, True, True, True, True, True, True, True],
# fill_value=999999,
# dtype=int64)
print(c)
#[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
d[~a.mask] = d[~a.mask] + b[~a.mask]
print(d)
#[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
I expected c to not change, but I guess there is something related to objects in memory going on here. Also, += keeps the original object, while = and + creates a new d.
I just don't really understand where the data comes from that's added to c.
I will start with a simpler example for better understanding:
b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
#b: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#b.data: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
c = np.zeros(b.shape)
#c: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
d = np.zeros(b.shape)
#d: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
c += b
#c: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
d = d + b
#d: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#d.data: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
The first operation c += b is an in-place operation. In other words, it is equivalent to c = type(c).__iadd__(c, b) which does the addition according to type of c, which is not a masked array, hence the data of b used as unmasked.
On the other hand, d = d + b is equivalent to d = np.MaskedArray.__add__(d, b) (to be more particular, since masked arrays are a subclass of ndarrays, it uses __radd__) and is NOT an in-place assignment. This means it creates a new object and uses the wider type on the right hand side of the equation when adding and hence converts d (which is an unmasked array) to a masked array (because b is a masked array), therefore the addition uses valid values only (which in this case there is none since ALL elements of b are masked and invalid). This results in a masked array d with same mask as b while the data of d remains unchanged.
This difference in behavior is not Numpy specific and applies to python itself too. The case mentioned in the question by OP has similar behavior, and as #alaniwi mentioned in the comments, the boolean indexing with mask a is not fundamental to the behavior. Using a to mask elements of b, c, and d is only limiting the assignment to masked elements by a (rather than all elements of arrays) and nothing more.
To makes things a bit more interesting and in fact clearer, lets switch the places of b and d on the right hand side:
e = np.zeros(b.shape)
#e: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
e = b + e
#e: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#e.data: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
Note that, similar to d = d + b, the right hand side uses masked array __add__ function, so the output is a masked array, but since you are adding e to b (a.k.a e = np.MaskedArray.__add__(b, e)), the masked data of b is returned, while in d = d + b, you are adding b to d and data of d is returned.

Theano: how to efficiently undo/reverse max-pooling

I'm using Theano 0.7 to create a convolutional neural net which uses max-pooling (i.e. shrinking a matrix down by keeping only the local maxima).
In order to "undo" or "reverse" the max-pooling step, one method is to store the locations of the maxima as auxiliary data, then simply recreate the un-pooled data by making a big array of zeros and using those auxiliary locations to place the maxima in their appropriate locations.
Here's how I'm currently doing it:
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
# HERE is the function that I hope could be improved
def upsamplecode(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
for whichitem in range(minibatchsize):
for whichfilt in range(numfilters):
upsampled = T.set_subtensor(upsampled[whichitem, whichfilt, auxpos[whichitem, whichfilt, :]], encoded[whichitem, whichfilt, :])
return upsampled
totalitems = minibatchsize * numfilters * numsamples
code = theano.shared(np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor # arbitrary positions within a bin
auxpos += (np.arange(4) * 5).reshape((1,1,-1)) # shifted to the actual temporal bin location
auxpos = theano.shared(auxpos.astype(np.int))
print "code:"
print code.get_value()
print "locations:"
print auxpos.get_value()
get_upsampled = theano.function([], upsamplecode(code, auxpos))
print "the un-pooled data:"
print get_upsampled()
(By the way, in this case I have a 3D tensor, and it's only the third axis that gets max-pooled. People who work with image data might expect to see two dimensions getting max-pooled.)
The output is:
code:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
locations:
[[[ 0 6 12 18]
[ 4 5 11 17]
[ 3 9 10 16]]
[[ 2 8 14 15]
[ 1 7 13 19]
[ 0 6 12 18]]]
the un-pooled data:
[[[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 2. 0.
0. 0. 0. 0. 3. 0.]
[ 0. 0. 0. 0. 4. 5. 0. 0. 0. 0. 0. 6. 0. 0.
0. 0. 0. 7. 0. 0.]
[ 0. 0. 0. 8. 0. 0. 0. 0. 0. 9. 10. 0. 0. 0.
0. 0. 11. 0. 0. 0.]]
[[ 0. 0. 12. 0. 0. 0. 0. 0. 13. 0. 0. 0. 0. 0.
14. 15. 0. 0. 0. 0.]
[ 0. 16. 0. 0. 0. 0. 0. 17. 0. 0. 0. 0. 0. 18.
0. 0. 0. 0. 0. 19.]
[ 20. 0. 0. 0. 0. 0. 21. 0. 0. 0. 0. 0. 22. 0.
0. 0. 0. 0. 23. 0.]]]
This method works but it's a bottleneck, taking most of my computer's time (I think the set_subtensor calls might imply cpu<->gpu data copying). So: can this be implemented more efficiently?
I suspect there's a way to express this as a single set_subtensor() call which may be faster, but I don't see how to get the tensor indexing to broadcast properly.
UPDATE: I thought of a way of doing it in one call, by working on the flattened tensors:
def upsamplecode2(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
add_to_flattened_indices = theano.shared(np.array([ [[(y + z * numfilters) * numsamples * upsampfactor for x in range(numsamples)] for y in range(numfilters)] for z in range(minibatchsize)], dtype=theano.config.floatX).flatten(), name="add_to_flattened_indices")
upsampled = T.set_subtensor(upsampled.flatten()[T.cast(auxpos.flatten() + add_to_flattened_indices, 'int32')], encoded.flatten()).reshape(upsampled.shape)
return upsampled
get_upsampled2 = theano.function([], upsamplecode2(code, auxpos))
print "the un-pooled data v2:"
ups2 = get_upsampled2()
print ups2
However, this is still not good efficiency-wise because when I run this (added on to the end of the above script) I find out that the Cuda libraries can't currently do the integer index manipulation efficiently:
ERROR (theano.gof.opt): Optimization failure due to: local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 1493, in process_node
replacements = lopt.transform(node)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/opt.py", line 952, in local_gpu_advanced_incsubtensor1
gpu_y = gpu_from_host(y)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 507, in __call__
node = self.make_node(*inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/basic_ops.py", line 133, in make_node
dtype=x.dtype)()])
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/type.py", line 69, in __init__
(self.__class__.__name__, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None
I don't know whether this is faster, but it may be a little more concise. See if it is useful for your case.
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
totalitems = minibatchsize * numfilters * numsamples
code = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor
auxpos += (np.arange(4) * 5).reshape((1,1,-1))
# first in numpy
shp = code.shape
upsampled_np = np.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled_np[np.arange(shp[0]).reshape(-1, 1, 1), np.arange(shp[1]).reshape(1, -1, 1), auxpos] = code
print "numpy output:"
print upsampled_np
# now the same idea in theano
encoded = T.tensor3()
positions = T.tensor3(dtype='int64')
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled = T.set_subtensor(upsampled[T.arange(shp[0]).reshape((-1, 1, 1)), T.arange(shp[1]).reshape((1, -1, 1)), positions], encoded)
print "theano output:"
print upsampled.eval({encoded: code, positions: auxpos})

Categories

Resources