Dynamic Matrix using Numpy - python

I am working on HMM algorithm.As I am new to python I wanted to know how i can dynamically change the dimension of a matrix using numpy.
I am getting inconsistent output with numpy resize.
1 [[ 1.]]
2 [[ 1. 1.]]
[[ 2. 1.]]
3 [[ 2. 1. 0.]
[ 0. 0. 1.]]
4 [[ 2. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 0. 1.]]
Here in fourth i should get output as
[[ 2. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]]
Code :-
with open("SampleFile.tsv") as tsvfile:
tsvreader = csv.reader(tsvfile, delimiter="\t")
wordindex = 0
postagindex = 0
flag = 0
for line in tsvreader:
if len(line) > 0:
if line[1] not in words:
wcount += 1
flag = 1
#wordlikelihoodmatrix.resize(poscount, wcount)
words.append(line[1])
wordindex = words.index(line[1])
else:
wordindex = words.index(line[1])
if line[2] not in poslist:
poscount += 1
flag = 1
#wordlikelihoodmatrix.resize(poscount, wcount)
poslist.append(line[2])
posindex = poslist.index(line[2])
else:
posindex = poslist.index(line[2])
if flag == 1:
wordlikelihoodmatrix.resize(poscount, wcount)
flag = 0
wordlikelihoodmatrix[posindex][wordindex] +=1
print wordlikelihoodmatrix;
Can anybody suggest where i am going wrong while resizing or any other way to get the expected output without using numpy ?

Related

Is there a python function for assigning values to several elements in a list?

I used the code shown below to create a list of lists.
Code:
num = 782
sol=4
pop_size= [sol, num]
initial_population_1 = np.random.uniform(low=0.0, high=0.0, size=pop_size)
The list of lists is shown below:
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
How can I randomly assign five values that are greater than 0 but less than 10 to five elements in each list in the list?
Thank you very much!
So, you have a list of lists, specifically a list of 4 lists, each of them containing 782 elements, all 0.0, and you want to set 5 elements at random to 1.0.
I'd like to mention that, as you are using Numpy, there is np.zeros(shape) that provides you with a zero-filled array, but whatever…
From your question it's not clear if you want to avoid to use twice the same location, but let's assume that you want to assign a random value to exactly 5 entries in each row
for row in initial_population_1:
locations_used_in_this_row = 0
while locations_used_in_this_row != 5:
column = np.random.randint(num)
if row[column] == 0.0:
row[column] = np.random.rand()*10
locations_used_in_this_row += 1

Is it possible to find similarities between rows in a matrix without loop?

i have a 2D numpy array. I'm trying to compute the similarities between rows and put it into a similarities array. Is this possible without loop? Thanks for your time!
# ratings.shape = (943, 1682)
arri = np.zeros(943)
arri = np.where(arri == 0)[0]
arrj = np.zeros(943)
arrj = np.where(arrj ==0)[0]
similarities = np.zeros((ratings.shape[0], ratings.shape[0]))
similarities[arri, arrj] = np.abs(ratings[arri]-ratings[arrj])
I want to make a 2D-array similarities in that similarities[i, j] is the differentiation between row i and row j in ratings
[ValueError: shape mismatch: value array of shape (943,1682) could not be broadcast to indexing result of shape (943,)]
[1][1]: https://i.stack.imgur.com/gtst9.png
The problem is how numpy iterates through the array when indexing a two-dimentional array with two arrays.
First some setup:
import numpy;
ratings = numpy.arange(1, 6)
indicesX = numpy.indices((ratings.shape[0],1))[0]
indicesY = numpy.indices((ratings.shape[0],1))[0]
ratings: [1 2 3 4 5]
indicesX: [[0][1][2][3][4]]
indicesY: [[0][1][2][3][4]]
Now lets see what your program produces:
similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))
similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[0])
similarities:
[[0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 2. 0. 0.]
[0. 0. 0. 3. 0.]
[0. 0. 0. 0. 4.]]
As you can see, numpy iterates over similarities basically like the following:
for i in range(5):
similarities[indicesX[i], indicesY[i]] = numpy.abs(ratings[i]-ratings[0])
similarities:
[[0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 2. 0. 0.]
[0. 0. 0. 3. 0.]
[0. 0. 0. 0. 4.]]
Now instead we need indices like the following to iterate through the entire array:
indecesX = [0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4]
indecesY = [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4]
We do that the following:
# Reshape indicesX from (x,1) to (x,). Thats important for numpy.tile().
indicesX = indicesX.reshape(indicesX.shape[0])
indicesX = numpy.tile(indicesX, ratings.shape[0])
indicesY = numpy.repeat(indicesY, ratings.shape[0])
indicesX: [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4]
indicesY: [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4]
Perfect! Now just call similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY]) again and we see:
similarities:
[[0. 1. 2. 3. 4.]
[1. 0. 1. 2. 3.]
[2. 1. 0. 1. 2.]
[3. 2. 1. 0. 1.]
[4. 3. 2. 1. 0.]]
Here the whole code again:
import numpy;
ratings = numpy.arange(1, 6)
indicesX = numpy.indices((ratings.shape[0],1))[0]
indicesY = numpy.indices((ratings.shape[0],1))[0]
similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))
indicesX = indicesX.reshape(indicesX.shape[0])
indicesX = numpy.tile(indicesX, ratings.shape[0])
indicesY = numpy.repeat(indicesY, ratings.shape[0])
similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY])
print(similarities)
PS
You commented on your own post to improve it. You should edit your question instead of commenting on it, when you want to improve it.

IndexError: index n is out of bounds for axis 1 with size n

I have an empty matrix M.shape:
(179, 179)
Now I want to populate it using the following loop:
for game in range(len(games)-1):
df_round = df_games_position[df_games_position['rodada_id'] == games['rodada_id'][game]]
players_home = df_round[df_round['time_id'] == games['time_id'][game]]
players_away = df_round[df_round['time_id'] == games['adversario_id'][game]]
count=0
for j_home in range(len(players_home)):
count_fora=0
for j_away in range(len(players_away)):
score_home = 0
score_away = 0
points_j_home = players_home['points_num'].iloc[j_home]
points_j_away = players_away['points_num'].iloc[j_away]
print ('POINTS HOME',points_j_home)
print ('POINTS AWAY',points_j_away)
soma = points_j_home + points_j_away
if soma != 0:
score_home = points_j_home / soma
score_away = points_j_away / soma
print ('SCORE HOME', score_home)
print ('SCORE AWAY',score_away)
j1 = players_home['Rank'].iloc[j_home].astype('int64')
j2 = players_away['Rank'].iloc[j_away].astype('int64')
print ('j1',j1)
print ('j2',j2)
M[j1,j1] = M[j1,j1] + games['goals_home_norm'][game] + score_home
M[j1,j2] = M[j1,j2] + games['goals_away_norm'][game] + score_away
M[j2,j1] = M[j2,j1] + games['goals_home_norm'][game] + score_home
M[j2,j2] = M[j2,j2] + games['goals_away_norm'][game] + score_away
print (M)
count+=1
print ('COUNT', count)
Finally I get the error:
M[j1,j2] = M[j1,j2] + games['gols_fora_norm'][game] + score_home
IndexError: index 179 is out of bounds for axis 1 with size 179
My last iteration round of prints:
COUNT 3
SCORE HOME 0.0
SCORE AWAY 0.0
j1 7
j2 162
[[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 8.57263145]]
COUNT 4
SCORE HOME 0.0
SCORE AWAY 0.0
j1 7
j2 179
What am I missing?
Matrix is numpy array, and the index for it is start with 0 not 1
np.array([1,2,3,4]).shape
Out[29]: (4,)
np.array([1,2,3,4])[3]
Out[30]: 4
We can simple fix it create the empty M with shape (180,180)
M = M[1:,1:]

How to calculate formula for every value in an array?

Im trying to get to understand how to use numpy for calculating a formula for different times. The way the code is written gives all the values where y is bigger than 0. I am experimenting how to get the values for all y's.
Is there someone who can explain me the part: ft = t * [y >= 0.0 ]. How do i use the parts within the brackets?
from numpy import *
g = 10.0
h0 = 10.0
t = arange(0, 10.1 ,0.1)
y = h0 - 0.5*g*t*t
ft = t * [y >= 0.0 ]
print(ft)
This is the output, but I would like to see all the values calculated. So i experimented a bit but i could not figure it out how to do it and how the; [y >= 0.0] part exactly works.
[[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]
If i use [y] instead of [y >= 0.0] i get the following:
[[ 0.000000e+00 9.950000e-01 1.960000e+00 2.865000e+00 3.680000e+00
4.375000e+00 4.920000e+00 5.285000e+00 5.440000e+00 5.355000e+00
5.000000e+00 4.345000e+00 3.360000e+00 2.015000e+00 2.800000e-01
-1.875000e+00 -4.480000e+00 -7.565000e+00 -1.116000e+01 -1.529500e+01
-2.000000e+01 -2.530500e+01 -3.124000e+01 -3.783500e+01 -4.512000e+01
-5.312500e+01 -6.188000e+01 -7.141500e+01 -8.176000e+01 -9.294500e+01
-1.050000e+02 -1.179550e+02 -1.318400e+02 -1.466850e+02 -1.625200e+02
-1.793750e+02 -1.972800e+02 -2.162650e+02 -2.363600e+02 -2.575950e+02
-2.800000e+02 -3.036050e+02 -3.284400e+02 -3.545350e+02 -3.819200e+02
-4.106250e+02 -4.406800e+02 -4.721150e+02 -5.049600e+02 -5.392450e+02
-5.750000e+02 -6.122550e+02 -6.510400e+02 -6.913850e+02 -7.333200e+02
-7.768750e+02 -8.220800e+02 -8.689650e+02 -9.175600e+02 -9.678950e+02
-1.020000e+03 -1.073905e+03 -1.129640e+03 -1.187235e+03 -1.246720e+03
-1.308125e+03 -1.371480e+03 -1.436815e+03 -1.504160e+03 -1.573545e+03
-1.645000e+03 -1.718555e+03 -1.794240e+03 -1.872085e+03 -1.952120e+03
-2.034375e+03 -2.118880e+03 -2.205665e+03 -2.294760e+03 -2.386195e+03
-2.480000e+03 -2.576205e+03 -2.674840e+03 -2.775935e+03 -2.879520e+03
-2.985625e+03 -3.094280e+03 -3.205515e+03 -3.319360e+03 -3.435845e+03
-3.555000e+03 -3.676855e+03 -3.801440e+03 -3.928785e+03 -4.058920e+03
-4.191875e+03 -4.327680e+03 -4.466365e+03 -4.607960e+03 -4.752495e+03
-4.900000e+03]]
I would like to know how i can use numpy to calculate at once all the outcomes of a formula for different time intervals.
Thanks,
y >= 0.0 gives you an array of Booleans which contain True/False depending on the fulfillment of the condition y >= 0.0. When you enclose it within [] as [y >= 0.0], you get a list which contains a single array of Booleans, as pointed out by #nicola in the comments below.
[array([ True, True, True, True, True, False, False, False,...
... False, False, False, False])]
Now you multiply this with your arange array which will give you 0 when the right hand side of * operator is False and will give you the actual value from the arange when the right hand side of * operator is True
The array [y >= 0.0] produces and array of booleans. i.e. 1 if y>=0 and 0 if not. That array of 1's and 0's is then multiplied by t.
It is not clear to me from your question however, what you are trying to do with it.

Theano: how to efficiently undo/reverse max-pooling

I'm using Theano 0.7 to create a convolutional neural net which uses max-pooling (i.e. shrinking a matrix down by keeping only the local maxima).
In order to "undo" or "reverse" the max-pooling step, one method is to store the locations of the maxima as auxiliary data, then simply recreate the un-pooled data by making a big array of zeros and using those auxiliary locations to place the maxima in their appropriate locations.
Here's how I'm currently doing it:
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
# HERE is the function that I hope could be improved
def upsamplecode(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
for whichitem in range(minibatchsize):
for whichfilt in range(numfilters):
upsampled = T.set_subtensor(upsampled[whichitem, whichfilt, auxpos[whichitem, whichfilt, :]], encoded[whichitem, whichfilt, :])
return upsampled
totalitems = minibatchsize * numfilters * numsamples
code = theano.shared(np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor # arbitrary positions within a bin
auxpos += (np.arange(4) * 5).reshape((1,1,-1)) # shifted to the actual temporal bin location
auxpos = theano.shared(auxpos.astype(np.int))
print "code:"
print code.get_value()
print "locations:"
print auxpos.get_value()
get_upsampled = theano.function([], upsamplecode(code, auxpos))
print "the un-pooled data:"
print get_upsampled()
(By the way, in this case I have a 3D tensor, and it's only the third axis that gets max-pooled. People who work with image data might expect to see two dimensions getting max-pooled.)
The output is:
code:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
locations:
[[[ 0 6 12 18]
[ 4 5 11 17]
[ 3 9 10 16]]
[[ 2 8 14 15]
[ 1 7 13 19]
[ 0 6 12 18]]]
the un-pooled data:
[[[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 2. 0.
0. 0. 0. 0. 3. 0.]
[ 0. 0. 0. 0. 4. 5. 0. 0. 0. 0. 0. 6. 0. 0.
0. 0. 0. 7. 0. 0.]
[ 0. 0. 0. 8. 0. 0. 0. 0. 0. 9. 10. 0. 0. 0.
0. 0. 11. 0. 0. 0.]]
[[ 0. 0. 12. 0. 0. 0. 0. 0. 13. 0. 0. 0. 0. 0.
14. 15. 0. 0. 0. 0.]
[ 0. 16. 0. 0. 0. 0. 0. 17. 0. 0. 0. 0. 0. 18.
0. 0. 0. 0. 0. 19.]
[ 20. 0. 0. 0. 0. 0. 21. 0. 0. 0. 0. 0. 22. 0.
0. 0. 0. 0. 23. 0.]]]
This method works but it's a bottleneck, taking most of my computer's time (I think the set_subtensor calls might imply cpu<->gpu data copying). So: can this be implemented more efficiently?
I suspect there's a way to express this as a single set_subtensor() call which may be faster, but I don't see how to get the tensor indexing to broadcast properly.
UPDATE: I thought of a way of doing it in one call, by working on the flattened tensors:
def upsamplecode2(encoded, auxpos):
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
add_to_flattened_indices = theano.shared(np.array([ [[(y + z * numfilters) * numsamples * upsampfactor for x in range(numsamples)] for y in range(numfilters)] for z in range(minibatchsize)], dtype=theano.config.floatX).flatten(), name="add_to_flattened_indices")
upsampled = T.set_subtensor(upsampled.flatten()[T.cast(auxpos.flatten() + add_to_flattened_indices, 'int32')], encoded.flatten()).reshape(upsampled.shape)
return upsampled
get_upsampled2 = theano.function([], upsamplecode2(code, auxpos))
print "the un-pooled data v2:"
ups2 = get_upsampled2()
print ups2
However, this is still not good efficiency-wise because when I run this (added on to the end of the above script) I find out that the Cuda libraries can't currently do the integer index manipulation efficiently:
ERROR (theano.gof.opt): Optimization failure due to: local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 1493, in process_node
replacements = lopt.transform(node)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/opt.py", line 952, in local_gpu_advanced_incsubtensor1
gpu_y = gpu_from_host(y)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 507, in __call__
node = self.make_node(*inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/basic_ops.py", line 133, in make_node
dtype=x.dtype)()])
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/type.py", line 69, in __init__
(self.__class__.__name__, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None
I don't know whether this is faster, but it may be a little more concise. See if it is useful for your case.
import numpy as np
import theano
import theano.tensor as T
minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5
totalitems = minibatchsize * numfilters * numsamples
code = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples))
auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor
auxpos += (np.arange(4) * 5).reshape((1,1,-1))
# first in numpy
shp = code.shape
upsampled_np = np.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled_np[np.arange(shp[0]).reshape(-1, 1, 1), np.arange(shp[1]).reshape(1, -1, 1), auxpos] = code
print "numpy output:"
print upsampled_np
# now the same idea in theano
encoded = T.tensor3()
positions = T.tensor3(dtype='int64')
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled = T.set_subtensor(upsampled[T.arange(shp[0]).reshape((-1, 1, 1)), T.arange(shp[1]).reshape((1, -1, 1)), positions], encoded)
print "theano output:"
print upsampled.eval({encoded: code, positions: auxpos})

Categories

Resources