Tensor power and multiplication in pytorch - python

I have a matrix A and a tensor b of size (1,3) - so a vector of size 3.
I want to compute
C = b1 * A + b2 * A^2 + b3 * A^3 where ^n is the n-th power of A.
At the end, C should have the same shape as A. How can I do this efficiently?

Let's try:
A = torch.ones(1,2,3)
b_vals = torch.tensor([2,3,4])
powers = torch.tensor([1,2,3])
C = (A[...,None]**powers + b_vals).sum(-1)
Output:
tensor([[[12., 12., 12.],
[12., 12., 12.]]])

Related

Efficient NumPy way of rolling nanstd

I have a 1-D NumPy array where I create a rolling window and then compute the np.nanstd:
import numpy as np
def rolling_window(a, window):
a = np.asarray(a)
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
if __name__ == "__main__":
n = 100_000_000
nan_indices = np.random.choice(np.arange(n), size=1000, replace=False)
T = np.random.rand(n)
T[nan_indices] = np.nan
m = 50
np.nanstd(rolling_window(T, m), axis=T.ndim)
However, I noticed that not only is this extremely time consuming, it also uses a lot of memory. Is there a way to improve both the memory and speed performance (Numba is an option)?
NumPy vectorized
After grueling through the math, here's what I ended up with few np.convolve and some masking to get a vectorized NumPy solution -
def nanstd(a, W):
k = np.ones(W, dtype=int)
m = ~np.isnan(a)
a0 = np.where(m, a,0)
n = np.convolve(m,k,'valid')
c1 = np.convolve(a0, k,'valid')
f2 = c1**2
p2 = f2/n**2
f1 = np.convolve((a0**2)*m,k,'valid')+n*p2
out = np.sqrt((f1 - (2/n)*f2)/n)
return out
Complete Explanation is at the end of this post.
Pandas equivalent
Here's the equivalent pandas version, which isn't too bad on performance -
import pandas as pd
def pdroll(T,m):
return pd.Series(T).rolling(m).std(ddof=0).values[m-1:]
Benchmarking
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
def setup(n):
nan_indices = np.random.choice(np.arange(n), size=10, replace=False)
T = np.random.rand(n)
T[nan_indices] = np.nan
return T
import benchit
f = {'rolling': lambda T,m: np.nanstd(rolling_window(T, m), axis=T.ndim),
'pdroll': pdroll, 'conv':nanstd}
in_={(n,w):(setup(n),w) for n in 10**np.arange(2,6) for w in [5,10,20,50,80,100]}
t = benchit.timings(f, in_, multivar=True)
t.plot(logx=True, sp_ncols=2, save='timings.png', dpi=200)
NumPy one is good on smaller window sizes, while pandas is better on larger ones.
NumPy vectorized : Explanation on NumPy version nanstd
Basically, np.nanstd is computing std ignoring NaNs. Now, std can be computed based on mean.
Thus, for an array a with no NaNs, it would be :
np.sqrt(np.mean((a-np.mean(a))**2)) # (1)
Let's prove it :
In [43]: a = np.arange(1,6).astype(float)
In [44]: np.nanstd(a)
Out[44]: 1.4142135623730951
In [45]: np.sqrt(np.mean((a-np.mean(a))**2))
Out[45]: 1.4142135623730951
Now, let's say, we have a NaN in it :
In [46]: a[2] = np.nan
In [47]: a
Out[47]: array([ 1., 2., nan, 4., 5.])
The std with nanstd would be :
In [48]: np.nanstd(a)
Out[48]: 1.5811388300841898
Let's figure out the equivalent one based on (1).
Let's start with (a-np.mean(a))**2.
This one : ?
In [72]: (a-np.mean(a))**2
Out[72]: array([nan, nan, nan, nan, nan])
No!
This one : ?
In [73]: (a0 - np.sum(a0)/n)**2
Out[73]: array([4., 1., 9., 1., 4.])
No! Because a is :
In [76]: a
Out[76]: array([ 1., 2., nan, 4., 5.])
We need to make the NaN position one 0.
This one : ?
In [75]: m*((a0 - np.sum(a0)/n)**2)
Out[75]: array([4., 1., 0., 1., 4.])
Yes!
Then, what's np.mean((a-np.mean(a))**2)? It would be, sum of those in [75] divided by n :
In [77]: np.sum(m*((a0-np.sum(a0)/n)**2))/n
Out[77]: 2.5
Hence, the final std value :
In [78]: np.sqrt(np.sum(m*((a0-np.sum(a0)/n)**2))/n)
Out[78]: 1.5811388300841898
Summarizing :
In [55]: m = ~np.isnan(a) # (2)
...: a0 = np.where(m, a,0)
...: n = m.sum()
...: out0 = np.sqrt(np.sum(m*((a0-np.sum(a0)/n)**2))/n)
In [56]: out0
Out[56]: 1.5811388300841898
Next part is incorporating the sliding nature. So, we need to do (2) in a sliding nature. So, the first two steps remains.
Hence, it starts off with :
m = ~np.isnan(a)
a0 = np.where(m, a,0)
But the last two would change, let's see how.
Let's focus on the final step to compute out0. We have :
m*((a0-np.sum(a0)/n)**2)
Then, we compute the summation :
np.sum(m*((a0-np.sum(a0)/n)**2))
We have : (a-b)**2 = a**2 + b**2 - 2*a*b. So, earlier step becomes
np.sum(m*(a0**2 + (np.sum(a0)/n)**2 - 2*a0*np.sum(a0)/n))
Further re-arranging leads to :
np.sum(m*(a0**2 + (np.sum(a0)/n)**2) - np.sum(2*a0*np.sum(a0)/n))
np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - np.sum(2*a0*np.sum(a0)/n)
np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - 2*np.sum(a0*np.sum(a0))/n
np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - (2/n)*np.sum(a0*np.sum(a0)) # (3)
Let's focus on the first two parts for the summation.
Also, let's take a sample case to make things concrete. We will setup two datasets - One for the complete array setup and another a windowed version of the complete one.
Setup :
#=========================== 1. Complete setup
a = np.arange(1,10).astype(float)
a[[2,5]] = np.nan
W = 5
k = np.ones(W, dtype=int)
m_comp = ~np.isnan(a)
a0_comp = np.where(m_comp, a,0)
n_comp = np.convolve(m_comp,k,'valid')
c1 = np.convolve(a0_comp, k,'valid')
c2 = np.convolve((a0_comp**2)*m_comp,k,'valid')
#=========================== 2. Windowed setup
a1 = np.arange(1,6).astype(float)
a1[2] = np.nan
m = ~np.isnan(a1)
a0 = np.where(m, a1,0)
n = m.sum()
out0 = np.sqrt(np.sum(m*((a0-np.sum(a0)/n)**2))/n)
From windowed setup, we have :
In [51]: np.sum(m*(a0**2 + (np.sum(a0)/n)**2))
Out[51]: 82.0
In [52]: np.sum(m*(a0**2) + m*((np.sum(a0)/n)**2))
Out[52]: 82.0
In [53]: np.sum(m*(a0**2)) + np.sum(m*((np.sum(a0)/n)**2))
Out[53]: 82.0
First summation part :
In [86]: np.sum(m*(a0**2))
Out[86]: 46.0
# complete setup version :
In [87]: c2
Out[87]: array([ 46., 45., 90., 154., 219.])
Second summation part :
In [54]: np.sum(m*((np.sum(a0)/n)**2))
Out[54]: 36.0
# complete setup version :
In [55]: n_comp*(c1/n_comp)**2
Out[55]:
array([ 36. , 40.33333333, 85.33333333, 144. ,
210.25 ])
The remaining piece of puzzle fromn (3) is :
In [79]: (2/n)*np.sum(a0*np.sum(a0))
Out[79]: 72.0
Let's focus on the meat of it :
In [80]: np.sum(a0*np.sum(a0))
Out[80]: 144.0
On the complete setup, it would correspond to :
In [81]: c1**2
Out[81]: array([144., 121., 256., 576., 841.])
Thus, for the entire remaining piece :
In [82]: (2/n)*np.sum(a0*np.sum(a0))
Out[82]: 72.0
# complete setup version :
In [83]: (2/n_comp)*c1**2
Out[83]:
array([ 72. , 80.66666667, 170.66666667, 288. ,
420.5 ])
Hence, (3) and its complete version counterpart would be :
In [89]: np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - (2/n)*np.sum(a0*np.sum(a0))
Out[89]: 10.0
In [90]: c2 + n_comp*(c1/n_comp)**2 - (2/n_comp)*c1**2
Out[90]: array([10. , 4.66666667, 4.66666667, 10. , 8.75 ])
To get the final std values, we need to divide by the count of valid ones per window and then apply sqrt :
In [99]: np.sqrt((c2 + n_comp*(c1/n_comp)**2 - (2/n_comp)*c1**2)/n_comp)
Out[99]: array([1.58113883, 1.24721913, 1.24721913, 1.58113883, 1.47901995])
Hence, with some cleanup, we end up with the final nanstd version.

User gives MXN matrix and i want to convert to numpy matrix

Input
2 3 **Here 2,3 are size of mateix m,n**
2 3 4
5 6 7
output
[
[2 3 4]
[5 6 7]
]
Note:-use Numpy package
If the input matrix is in the file you can use the following program:
import numpy as np
x = np.loadtxt('file.txt', dtype=int, delimiter=' ') # you don't need to give the matrix size at the beginning of the file
print(x)
rows = int(input('give number of rows: '))
cols = int(input('give number of columns: '))
x = np.zeros((rows,cols), dtype=int)
for i in range(rows):
for j in range(cols):
x[i,j] = int(input(f"give matrix element for row {i} and column {j}: "))
print(x)
If you want the user to input the matrix elements use the second part of the program.
Reading m x n input as a numpy matrix :
import numpy as np
m,n = map(int, input().split())
arr = np.zeros((m,n))
for i in range(m):
arr[i] = np.array(list(map(int, input().split())))
Output :
array([[1., 2., 3.],
[4., 5., 6.]])

Generating Random vectors and matrices for weights and bias

I am trying to know the chance of fire based on sensors x1 and x2.
y=1
For this, I am trying to generate random vectors and matrices for weights and bias but I get an error.
import numpy as np
np.random.seed(seed=123)
w1 = np.random.rand(4,2)
b1 = 4*1
x = np.array([0.4, 0.32])
z1 = np.dot(w1,x) + b1
a1 = 1 / (1+np.exp(-z1))
np.random.seed(seed=123)
w2 = np.random.rand(1,4)
b2 = 1*1
z2 = np.dot(w2,x) + b2
a2 = 1 /(1+np.exp(-z2))
But I get the error below:
----> 1 z2 = np.dot(w2,x) + b2
2 a2 = np.tanh(Z1)
3 print(a2)
ValueError: shapes (2,4) and (2,) not aligned: 4 (dim 1) != 2 (dim 0)
I am not able to figure out how to solve this.
The answer is in the error - you are trying to multiply the matrices w2 and x, which have invalid dimensions to be multiplied.
Matrix w2 has 1 row and 4 columns:
>>> w2 = np.random.rand(1,4)
>>> w2.shape
(1, 4)
Matrix x has 2 entries:
>>> x = np.array([0.4, 0.32])
>>> x.shape
(2,)
Therefore, you cannot multiply these matrices together - matrices can only be multiplied if and only if the number of columns in the first matrix, w2, is equal to the number of rows in the second, x. Here, as the error says, 4 (dim 1) != 2 (dim 0).
You can solve this by either giving x four rows, or w2 two columns.
Hope this helps.

Where am I going wrong in the following LP code?

I am trying to solve an LP problem with two variables with two constraints where one is inequality and the other one is equality constraint in Scipy.
To convert the inequality in the constraint I have added another variable in it called A.
Min(z) = 80x + 60y
Constraints:
0.2x + 0.32y <= 0.25
x + y = 1
x, y <= 0
I have changed the inequality constraints by the following equations by adding an extra variable A
0.2x + 0.32y + A = 0.25
Min(z) = 80x + 60y + 0A
X+ Y + 0A = 1
from scipy.optimize import linprog
import numpy as np
z = np.array([80, 60, 0])
C = np.array([
[0.2, 0.32, 1],
[1, 1, 0]
])
b = np.array([0.25, 1])
x1 = (0, None)
x2 = (0, None)
sol = linprog(-z, A_eq = C, b_eq = b, bounds = (x1, x2), method='simplex')
However, I am getting an error message
Invalid input for linprog with method = 'simplex'. Length of bounds
is inconsistent with the length of c
How can I fix this?
The problem is that you do not provide bounds for A. If you e.g. run
linprog(-z, A_eq = C, b_eq = b, bounds = (x1, x2, (0, None)), method='simplex')
you will obtain:
con: array([0., 0.])
fun: -80.0
message: 'Optimization terminated successfully.'
nit: 3
slack: array([], dtype=float64)
status: 0
success: True
x: array([1. , 0. , 0.05])
As you can see, the constraints are met:
0.2 * 1 + 0.32 * 0.0 + 0.05 = 0.25 # (0.2x + 0.32y + A = 0.25)
and also
1 + 0 + 0 = 1 # (X + Y + 0A = 1)

Find the sum of a pair of columns starting from first and the middle one?

My input:
a b c d e f
g h i j k l
My output should have three sets of solutions like this:
Sq( a**2 + d**2 ) + Sq ( g**2 + j**2 )
Sq( b**2 + e**2 ) + Sq ( h**2 + k**2 )
Sq( c**2 + f**2 ) + Sq ( i**2 + l**2 )
My actual text file has so many rows and columns with no header. This is what I have so far:
import os
import math
import numpy as np
for file in os.listdir("directory"):
if file.endswith(".txt"):
fin=open(file, 'r')
total = 0
for line in fin:
str = [float(x) for x in line.split()]
for i in range(len(str[0:5])):
str[i]=float(str[i])
sum=np.sum((math.pow(str[i],2)+math.pow(str[i+3],2))**0.5
total += sum
fin.close()
With a file:
1 2 3 4 5 6
11 12 13 14 15 16
Correcting indention and range:
with open('stack53269737.txt') as f:
total = 0
for line in f:
str = [float(x) for x in line.split()]
for i in range(3):
str[i]=float(str[i])
sum=np.sum((math.pow(str[i],2)+math.pow(str[i+3],2))**0.5)
total += sum
In [111]: total
Out[111]: 73.84586902040324
with further cleanup
with open('stack53269737.txt') as f:
total = 0
for line in f:
alist = [float(x) for x in line.split()]
for i in range(3):
total += (alist[i]**2+alist[i+3]**2)**0.5
We don't need to convert to float twice; we don't need math for simple squares.
A numpy approach:
load it with a numpy csv reader:
In [126]: data = np.genfromtxt('stack53269737.txt')
In [127]: data
Out[127]:
array([[ 1., 2., 3., 4., 5., 6.],
[11., 12., 13., 14., 15., 16.]])
reshape the array to express your row splitting:
In [128]: data1 = data.reshape(2,2,3)
In [129]: data1
Out[129]:
array([[[ 1., 2., 3.],
[ 4., 5., 6.]],
[[11., 12., 13.],
[14., 15., 16.]]])
Now we can just square all values, sum on the correct axis, take squareroot and sum again:
In [130]: np.sum(np.sum(data1**2, axis=1)**.5)
Out[130]: 73.84586902040324
If you wish to do it without numpy, you could try the following:
import math
with open("data.txt", "r") as infile:
# Split the lines and then split numbers from each line.
lines = list(map(str.split, infile.read().split('\n')))
# Use zip to create tuples of values that take part in each operation.
lines = list(zip(*lines))
# Get length of each line.
lineLength = len(lines)
# Find the total.
total = sum([math.sqrt(int(lines[i][j])**2 + int(lines[i+3][j])**2) for i in range(lineLength-3) for j in range(2)])
print(total)
Given a file with the following data:
1 2 3 4 5 6
7 8 9 10 11 12
The result is:
57.02450048972068

Categories

Resources