How can I vertically concatenate 1x2 arrays in numpy? - python

How can I concatenate 1x2 arrays produced in another function?
I have a for loop that produces xa as output, which is a float64 (1L,2L).
xa = [[ 1.17281823 1.210732 ]]
The code I tried for concatenate is
A = []
for i in range(5):
# actually xa = UglyCalculation(**Inputs[i])
xa = np.array([[ i, i+1 ]]) # for our example here
# do something
All I want to do is concatenate/join/append these xa values vertically.
Basically the desired output should be
0 1
1 2
2 3
3 4
4 5
I have tried the following lines of code, but it's not working. Can you help?
A = np.concatenate((A, xa)) # Concatenate the matrices
A = np.concatenate((A,xa),axis=0)
A=np.vstack((A,xa))

Setting a shape on A allows concatenation of similar column sized arrays:
import numpy as np
A = np.array([])
A.shape=(0,2) # set A to be 0 rows x 2 cols matrix
for i in range(5):
xa = np.array([[i, i+1]])
A = np.concatenate( (A,xa) )
print A
output
[[ 0. 1.]
[ 1. 2.]
[ 2. 3.]
[ 3. 4.]
[ 4. 5.]]
without A.shape = (0,2) an Exception will be thrown:
Traceback (most recent call last):
File "./arr.py", line 5, in <module>
A = np.concatenate( (A,xa) )
ValueError: all the input arrays must have same number of dimensions

Related

Combining two numpy arrays with equations based on both arrays

I have a python numpy 3x4 array A:
A=np.array([[0,1,2,3],[4,5,6,7],[1,1,1,1]])
and a 3x3 array B:
B=np.array([[1,1, 1],[2, 2, 2],[3,3,3]])
I am trying to use a numpy operation to produce array C where each element in C is based on an equation using corresponding elements in A and the entire row in B. A simplified example:
C[row,col] = A[ro1,col] * ( A[row,col] / B[row,0] + B[row,1] + B[row,2) )
My first thoughts were to just simple and just multiply all of A by column in B. Error.
C = A * B[:,0]
Then I thought to try this but it didn't work.
C = A[:,:] * B[:,0]
I am not sure how to use the " : " operator and get access to the specific row, col at the same time. I can do this in regular loops but I wanted something more numpy.
mport numpy as np
A=np.array([[0,1,2,3],[4,5,6,7],[1,1,1,1]])
B=np.array([[1,1, 1],[2, 2, 2],[3,3,3]])
C=np.zeros([3,4])
row,col = A.shape
print(A.shape)
print(A)
print(B.shape)
print(B)
print(C.shape)
print(C)
print(range(row-1))
for row in range(row):
for col in range(col):
C[row,col] = A[row,col] * (( A[row,col] / B[row,0]) + B[row,1] + B[row,2])
print(C)
Which prints:
(3, 4)
[[0 1 2 3]
[4 5 6 7]
[1 1 1 1]]
(3, 3)
[[1 1 1]
[2 2 2]
[3 3 3]]
(3, 4)
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
range(0, 2)
[[ 0. 3. 8. 15. ]
[24. 32.5 42. 0. ]
[ 6.33333333 6.33333333 0. 0. ]]
Suggestions on a better way?
Edited:
Now that I understand broadcasting a bit more, and got that code running, let me expand in a generic way what I am trying to solve. I am trying to map values of a category such as "Air" which can be a range (such as 0-5) that have to be mapped to a shade of a given RGB value. The values are recorded over a time period.
For example, at time 1, the value of Water is 4. The standard RGB color for Water is Blue (0,0,255). There are 5 possible values for Water. In the case of Blue, 255 / 5 = 51. To get the effect of the 4 value on the Blue palette, multiply 51 x 4 = 204. Since we want higher values to be darker, we subtract 255 (white) - 205 yielding 51. The Red and Green components end up being 0. So the value read at time N is a multiply on the weighted R, G and B values. We invert 0 values to be subtracted from 255 so they appear white. Stronger values are darker.
So to calculate the R' G' and B' for time 1 I used:
answer = data[:,1:4] - (data[:,1:4] / data[:,[0]] * data[:,[4]])
I can extract an [R, G, B] from and answer and put into an Image at some x,y. Works good. But I can't figure out how to use Range, R, G and B and calculate new R', G', B' for all Time 1, 2, ... N. Trying to expand the numpy approach if possible. I did it with standard loops as:
for row in range(rows):
for col in range(cols):
r = int(data[row,1] - (data[row,1] / data[row,0] * data[row,col_offset+col] ))
g = int(data[row,2] - (data[row,2] / data[row,0] * data[row,col_offset+col] ))
b = int(data[row,3] - (data[row,3] / data[row,0] * data[row,col_offset+col] ))
almostImage[row,col] = [r,g,b]
I can display the image in matplotlib and save it to .png, etc. So I think next step is to try list comprehension over the time points 2D array, and then refer back to the range and RGB values. Will give it a try.
Try this:
A*(A / B[:,[0]] + B[:,1:].sum(1, keepdims=True))
Output:
array([[ 0. , 3. , 8. , 15. ],
[24. , 32.5 , 42. , 52.5 ],
[ 6.33333333, 6.33333333, 6.33333333, 6.33333333]])
Explanation:
The first operation A/B[:,[0]] utilizes numpy broadcasting.
Then B[:,1:].sum(1, keepdims=True) is just B[:,1] + B[:,2], and keepdims=True allows the dimension to stay the same. Print it to see details.

python silently transposing rank 1 arrays

import numpy as np
x1 = np.arange(9.0).reshape((3, 3))
print("x1\n",x1,"\n")
x2 = np.arange(3.0)
print("x2\n",x2)
print(x2.shape,"\n")
print("Here, the shape of x2 is 3 rows by 1 column ")
print("x1#x2\n",x1#x2)
print("")
print("x2#x1 should not be possible\n",x2#x1,"\n"*3)
gives
x1
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
x2
[0. 1. 2.]
(3,)
Here, the shape of x2 is 3 rows by 1 column
x1#x2 =
[ 5. 14. 23.]
x2#x1 should not be possible, BUT
[15. 18. 21.]
Python3 seems to silently convert x2 into (1,3) array so it can be multiplied by x1. Or am I missing some syntax?
The arrays are being broadcasted by Numpy.
To quote the broadcasting documentation:
The term broadcasting describes how numpy treats arrays with different
shapes during arithmetic operations. Subject to certain constraints,
the smaller array is “broadcast” across the larger array so that they
have compatible shapes. Broadcasting provides a means of vectorizing
array operations so that looping occurs in C instead of Python. It
does this without making needless copies of data and usually leads to
efficient algorithm implementations. There are, however, cases where
broadcasting is a bad idea because it leads to inefficient use of
memory that slows computation.
Add the following line to your code where you explicitly set the shape of x2 to (3,1) and you will get an error as follows:
import numpy as np
x1 = np.arange(9.0).reshape((3, 3))
print(x1.shape) # new line added
print("x1\n",x1,"\n")
x2 = np.arange(3.0)
x2 = x2.reshape(3, 1) # new line added
print("x2\n",x2)
print(x2.shape,"\n")
print("Here, the shape of x2 is 3 rows by 1 column ")
print("x1#x2\n",x1#x2)
print("")
print("x2#x1 should not be possible\n",x2#x1,"\n"*3)
Output
(3, 3)
x1
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
x2
[[0.]
[1.]
[2.]]
(3, 1)
Here, the shape of x2 is 3 rows by 1 column
x1#x2
[[ 5.]
[14.]
[23.]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-c61849986c5c> in <module>
12 print("x1#x2\n",x1#x2)
13 print("")
---> 14 print("x2#x1 should not be possible\n",x2#x1,"\n"*3)
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 1)

Python: concatenate an integer array with 1 empty array

I have one simple one-dimensional array and an empty array in NumPy. I try to concatenate them, but I get an array in float.
from numpy import *
a = zeros(5,'i')
a += 1
b = []
c = hstack((a,b))
d = concatenate((a, b))
print("a",a)
print("b",b)
print("c",c)
print("d",d)
I got:
a [1 1 1 1 1]
b []
c [1. 1. 1. 1. 1.]
d [1. 1. 1. 1. 1.]
But I am looking for an integer array
[1 1 1 1 1]
How? And what is the most efficient way?
Try this way:
numpy array dtype by default is float.So, change it to np.int32
a = np.zeros(5,dtype=np.int32)
a += 1
b = np.array([],dtype=np.int32)
You might create b as 0-size np.array of dtype 'i' rather than list, that is:
import numpy as np
a = np.zeros(5,'i')
a += 1
b = np.array([],'i')
c = np.hstack((a,b))
d = np.concatenate((a, b))
print(d)
Output:
[1 1 1 1 1]
I think numpy assumes the empty array as float64 data type.
If you run the following
np.array([]).dtype
it returns dtype('float64')
so you should initilize the empty array as follows
b=[]
b=np.array(b,dtype="int32")
What is point you willing to have same array as input ?
use numpy.ones to reduce computation instead of numpy.zeros
`
import numpy
a = numpy.ones(5,dtype=int)
b = []
b = numpy.array([],dtype=int)
d = concatenate((a, b))
`

python numpy - improve efficiency on column-wise cosine similarity

I am fairly new to programming and I never used numpy before.
So, I have a matrix with 19001 x 19001 dimensions. It contains a lot of zeros, so it is relatively sparse. I wrote some code to compute the pairwise cosine similarity of the columns if the item in the row is non-zero. I add all the pairwise similarity values of one row and do some mathematical operations on them to obtain one value for each row of the matrix in the end (see code below). It does what it is supposed to, however as dealing with a great number of dimensions, it is really slow. Is there any way to modify my code to make it more efficient?
import numpy as np
from scipy.spatial.distance import cosine
row_number = 0
out_file = open('outfile.txt', 'w')
for row in my_matrix:
non_zeros = np.nonzero(my_matrix[row_number])[0]
non_zeros = list(non_zeros)
cosine_sim = []
for item in non_zeros:
if len(non_zeros) <= 1:
break
x = non_zeros[0]
y = non_zeros[1]
similarity = 1 - cosine(my_matrix[:, x], my_matrix[:, y])
cosine_sim.append(similarity)
non_zeros.pop(0)
summing = np.sum(cosine_sim)
mean = summing / len(cosine_sim)
log = np.log(mean)
out_file_value = log * -1
out_file.write(str(row_number) + " " + str(out_file_value) + "\n")
if row_number <= 19000:
row_number += 1
else:
break
I know that there are some function to actually compute the cosine similarity even between columns (from sklearn.metrics.pairwise import cosine_similarity), so I tried it. However, the output is kind of the same but on the same time really confusing to me even though I read the documentation and the posts on this page referring to the issue.
For instance:
my_matrix =[[0. 0. 7. 0. 5.]
[0. 0. 11. 0. 0.]
[0. 2. 0. 0. 0.]
[0. 0. 2. 11. 5.]
[0. 0. 5. 0. 0.]]
transposed = np.transpose(my_matrix)
sim_matrix = cosine_similarity(transposed)
# resulting similarity matrix
sim_matrix =[[0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0.14177624 0.45112924]
[0. 0. 0.14177624 1. 0.70710678]
[0. 0. 0.45112924 0.70710678 1.]]
If I compute the cosine similarity with my code above, it returns 0.45112924 for the 1st row ([0]) and 0.14177624 and 0.70710678 for row 4 ([3]).
out_file.txt
0 0.796001425306
1 nan
2 nan
3 0.856981065776
4 nan
I greatly appreciate any help or suggestions to my question!
You can consider using scipy instead. However, it doesn't take sparse matrix input. You have to provide numpy array.
import scipy.sparse as sp
from scipy.spatial.distance import cdist
X = np.random.randn(10000, 10000)
D = cdist(X, X.T, metric='cosine') # cosine distance matrix between 2 columns
Here is the speed that I got for 10000 x 10000 random array.
%timeit cdist(X, X.T, metric='cosine')
16.4 s ± 325 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Try on small array
X = np.array([[1,0,1], [0, 3, 2], [1,0,1]])
D = cdist(X, X.T, metric='cosine')
This will give
[[ 1.11022302e-16 1.00000000e+00 4.22649731e-01]
[ 6.07767730e-01 1.67949706e-01 9.41783727e-02]
[ 1.11022302e-16 1.00000000e+00 4.22649731e-01]]
For example D[0, 2] is the cosine distance between column 0 and 2
from numpy.linalg import norm
1 - np.dot(X[:, 0], X[:,2])/(norm(X[:, 0]) * norm(X[:,2])) # give 0.422649

How to match elements of two series by np.array?

I have two series namely train_1 and train_2,
import numpy as np
mean = 0
std = 1
num_samples = 4
train_1 = numpy.random.normal(mean, std, size=num_samples)
train_2 = numpy.random.normal(mean, std, size=num_samples)
I am entering this command:
X = np.array(train_1,train_2, dtype=float)
and taking this output:
array([[ 0.82561222, 0.95885746, 0.40454621, 1.37793967],
[ 0.93473674, -1.51716492, -0.56732792, 1.03333013]])
But, I would like these different series to match in ordered manner such this:
Y = np.array(([3,5], [5,1], [10,2], [6,1.5]), dtype=float)
Y
array([[ 3. , 5. ],
[ 5. , 1. ],
[ 10. , 2. ],
[ 6. , 1.5]])
I might be misunderstanding your question, but is this not simply the transpose?
X = np.array(train_1,train_2, dtype=float).T
Note the .T at the end. In this case X will have two columns, the first will be train_1, the second will be train_2.

Categories

Resources