I have an array y that takes a value of either 0 or 1. Then I have an array yp that takes values between 0 and 1. The two arrays have the same length.
If an entry in y is 1, then I want to append the corresponding yp to a list, otherwise I want to append 1-yp:
y = [1,1,1,0,0]
yp = [0.1, 0.2, 0.3, 0.4, 0.5]
x = []
for idx, i in enumerate(y):
if i:
x.append(yp[idx])
else:
x.append(1-yp[idx])
Is there a shorter way to write this in Python, perhaps without a for-loop?
You can use a list comprehension with zip to iterate over both lists simultaneously:
>>> y = [1,1,1,0,0]
>>> yp = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> [b if a else 1 - b for a, b in zip(y, yp)]
[0.1, 0.2, 0.3, 0.6, 0.5]
Perhaps you are looking for something like this if I understand you correctly?
for idx, i in enumerate(y):
x.append(yp[idx]) if i else x.append(1-yp[idx])
More like a ternary operation approach?
Reference: http://book.pythontips.com/en/latest/ternary_operators.html
If your lists are very long, there's a more efficient way using numpy:
y, yp = map(np.asarray, (y, yp) # convert to an array
x = y * yp + (1 - y) * (1 - yp)
The code above, explained:
a = y * yp: results in an array with the same length as y (and yp) with 1 * yp where y = 1 and 0 where y = 0.
b = (1 - y) * (1 - yp): results in an array with 1-yp where y=0 and 0 where y = 1.
a + b is the pairwise sum and yields your expected result.
You are asking for list comprehension (what you call "one-liner" ??) such as :
y = [1,1,1,0,0]
yp = [0.1, 0.2, 0.3, 0.4, 0.5]
l=[yp[i] if y[i]==1 else 1-yp[i] for i in range(len(y))]
whill give you :
>>> l
[0.1, 0.2, 0.3, 0.6, 0.5]
Related
I am trying to make the code that I have into either recursion or dynamic programming.
import numpy as np
index_list = [1, 2, 0]
weights = [0.3, 0.8]
A_matrix = np.asarray([[0, 1, 2], [0, 1, 2], [0, 1, 2]])
initial_best_vector = A_matrix[:, 1]
# set best_vector_combinations to initial_best_vector
best_vector_combinations = initial_best_vector
for index, _ in enumerate(index_list[1:]):
best_vector_combinations = (
1 - weights[index]
) * best_vector_combinations + (
weights[index] * A_matrix[:, index_list[index + 1]]
)
Is it possible to do so? What I am doing is a nested linear combination of vectors, with the initial base being the initial_best_vector, which corresponds to the index_list.
In other words, let c_i be the columns of the matrix A, I want:
((1-0.3) * c_1 + 0.3 * c_2) * (1-0.8) + 0.8 * c_0
I hope to make this case more general to hold for any length of numbers.
Edit:
The code:
def calculate(vectors, weights):
if not (vectors or weights):
return 0
if not weights:
return vectors[0]
return vectors[0]*(1-weights[0]) + weights[0] * (calculate(vectors[1:], weights[1:]))
vectors = [1,2,3]
weights = [0.2, 0.3]
calculate(vectors, weights) = 1.26
but expected answer is 1.74 where i would expect first case to be 0.8 * 1 + 0.2 * 2 = 1.2, then second to be 1.2 * 0.7 + 3 * 0.3 = 1.74. Note I replaced your typo result to calculate but still unable to recover 1.74.
If you want a recursive implementation, if would be helpful to start with a simpler example and figure out the recurrence relation.
Let vectors = [8,5,2,1] (1D array for simplicity) and let weights = [0.5, 0.8, 0.1, 0.2].
First step of computation: (8 * 0.5) + (1-0.5)*(result of second step).
Second step: 5 * 0.8 + (1-0.8)*(result of third step).
You can work this out further, but the basic relation is
result(vectors, weights) =
(
vectors[0]*weights[0]) +
(1-weights[0]) * (result(vectors[1:], weights[1:]))
) if (vectors and weights) else 0
Implementation:
def calculate(vectors, weights):
if not (vectors or weights):
return 0
if not weights:
return vectors[0]
return vectors[0]*weights[0] + (1-weights[0]) * (calculate(vectors[1:], weights[1:]))
print(calculate([1,2,3], [0.2,0.3])) #left to right processing, 1.26
print(calculate([1,2,3][::-1], [0.2,0.3][::-1])) #right to left processing, 1.74
I was a bit curious on plotting the contour plots for a 2 dimensional gaussian distribution. In my case, for a given set of 2D points, I cluster them into different grid cells and compute the covariance matrix for every cell and plot the gaussian distribution for each and every cell. When I plot I do not want the entire contour for the cell but the distribution restricted within 3 sigma of the data points. Is there anyway it could be done ?
My code is as follows:
import numpy as np
import matplotlib.pyplot as plt
def createCells():
partition = 4
coords = [np.linspace(-1.0 , 1.0, num = partition + 1) for i in range(2)]
x, y = np.meshgrid(*coords)
return x, y
def probab(mean, covMat, lPoints):
lPoints = lPoints[..., np.newaxis] if lPoints.ndim == 2 else lPoints ## Create vectorized values for the x, y
if np.linalg.det(covMat) > 0:
factor1 = (2*np.pi)*(np.linalg.det(covMat)**(-1/2))
factor2 = np.exp((-1/2)*np.einsum('ijk,jl,ilk->ik', lPoints - mean, np.linalg.inv(covMat), lPoints - mean))
return factor1*factor2
if __name__ == '__main__':
points = np.array([[-0.35, -0.15], [0.1, 0.1], [-0.1, 0.1], [0.05, 0.05],[0.25, 0.05], [0.1, 0.15], [0.1, 0.2], [-0.2, -0.2], [-0.25, 0.25], [0.45, 0.45], [0.75, 0.75], [0.6, 0.6], [0.55, 0.55], [0.7, 0.7], [0.68, 0.73]])
x1, y1 = createCells()
x = x1[0]
y = y1[:,0]
lP = np.array([])
numberOftimes = 0
for i in range(len(x) - 1):
for j in range(len(y) - 1):
count = 0
meanX = 0.0
meanY = 0.0
localPoints = []
covMat1 = np.array([])
covMat2 = np.array([])
for point in points:
inbetween_x = x[i] <= point[0] <= x[i + 1]
inbetween_y = y[j] <= point[1] <= y[j + 1]
if inbetween_x and inbetween_y:
count += 1
meanX += point[0]
meanY += point[1]
localPoints.append([point[0], point[1]])
if count >= 2:
numberOftimes += 1
#print(f"The local points are {localPoints}")
localPoints = np.array(localPoints)
meanX /= count
meanY /= count
meanXY = np.array([meanX, meanY])
#print(meanXY.shape)
#print(localPoints.shape)
lP = localPoints - meanXY
for k in range(count):
lPtranspose = (np.array([lP[k]])).T
lPCurrent = (np.array([lP[k]]))
if len(covMat1) > 0:
covMat1 += lPtranspose.dot(lPCurrent)
else:
covMat1 = lPtranspose*lP[k]
covMat1 /= count
lPoints = localPoints[..., np.newaxis] if lP.ndim == 2 else lP ## Create vectorized values for the x, y
meanXY1 = localPoints.mean(0)
meanXY2 = lPoints.mean(0)
covMat3 = np.einsum('ijk, ikj->jk', lPoints - meanXY2, lPoints - meanXY2) / lPoints[0] - 1
#yamlStatus = self.savingYaml(i, j, meanXY, covMat3) ## To store the cell parameters in a yaml file (for now its just out of scope for the question)
if np.linalg.det(covMat3) > 0: #compute the probability only if the det is not 0
Xx = np.linspace(x[i], x[i + 1], 1000)
Yy = np.linspace(y[i], y[i + 1], 1000)
Xx,Yy = np.meshgrid(Xx, Yy)
lPoints = np.vstack((Xx.flatten(), Yy.flatten())).T
pos = np.empty(Xx.shape + (2,))
pos[:, :, 0] = Xx
pos[:, :, 1] = Yy
z2 = probab(meanXY2, covMat3, lPoints)
summed = np.sum(z2)
z2 = z2.reshape(1000, 1000)
cs = plt.contourf(Xx, Yy, z2)#, cmap=cm.viridis)
plt.clabel(cs)
localPoints = []
#print(f"The number of times count is greater than 1 is {numberOftimes}")
plt.plot(x1, y1, marker='.', linestyle='none', markersize=20)
plt.plot(points[:,0 ], points[:, 1], marker='.', linestyle='none', markersize=10)
plt.grid(linewidth=0.5)#abs(x1[0][0]-y1[0][0]))
plt.show()
I am trying to use the list created for Mf_values to be used in the expression for P0 in my code. I have tried it in the following way:
Mf_values=[0.8, 0.9, 1.2, 1.5]
Vinf_values=[Mf_value*(gamma*R*tatm)**0.5 for Mf_value in Mf_values]
print(Vinf_values)
P0=[(1+((gamma-1)/2)*(Mf_values**2)**(gamma/(gamma-1))]
T0=(1+((gamma-1)/2)*(Mf_values**2))*tatm
I want to use the 4 different Mf_values for solving the expression for P0 and T0 and save the results in a list in a similar fashion to Vinf_values. However, python gives me the following error:
P0=[(1+((gamma-1)/2)*(Mf_values**2)**(gamma/(gamma-1))]
^
SyntaxError: invalid syntax
How could I solve this issue?
It is easier to do what you want using numpy:
import numpy as np
# Change the below values to the correct ones
gamma = 0.5
R = 1.0
tatm = 1.0
Mf_values = np.array([0.8, 0.9, 1.2, 1.5])
Vinf_values = Mf_values * (gamma * R * tatm)**0.5
print(Vinf_values)
P0 = (1 + ((gamma - 1) / 2) * (Mf_values**2))**(gamma/(gamma - 1))
T0 = (1 + ((gamma - 1) / 2) * (Mf_values**2)) * tatm
If really need lists, you can simply convert to lists by doing this:
P0 = list(P0)
T0 = list(T0)
you cannot do pow operation and other operations that you're trying to do with a normal python list, you can either use numpy or you can do the following :
Mf_values = [0.8, 0.9, 1.2, 1.5]
Vinf_values = [ Mf_value * ( gamma * R * tatm ) ** 0.5 for Mf_value in Mf_values ]
Mf_values_2 = [v ** 2 for v in Mf_values ]
tmp = [ v ** ( gamma / (gamma - 1) ) for v in Mf_values_2 ]
P0=[ v * (1 + ( ( gamma - 1 ) / 2)) for v in tmp]
tmp2 = [ v * (1 + ( (gamma - 1) / 2 )) for v in Mf_values_2 ]
T0 = [tatm * v for v in tmp2 ]
to do an add operation between a value and a list, do :
# lst is a list and val is an number
result = [val * elem for elem in lst]
to do an add operation between the elements of 2 lists, do :
# lst1 and lst2 are lists
result = [a + b for a, b in zip(lst1, lst2)]
I created a loop to simulate the training for neural network and I find it odd that the weights that was first assigned as an Int turned into a Series
Sample Data (Note: created multiple samples of the same rows to make it up to 100 observations):
# x1 x2 y
data = [ [3.5, 1.5, 1],
[2.0, 1.0, 0],
[4.0, 1.5, 1],
[3.0, 1.0, 0],
[3.5, 0.5, 1],
[2.0, 0.5, 0],
[5.5, 1.0, 1],
[1.0, 1.0, 0] ]
#[4.5, 1.0, 1]
data = pd.DataFrame(data, columns = ["Length", "Width", "Class"])
data
Assigning Variables:
w1 = np.random.randn()
w2 = np.random.randn()
b = np.random.randn()
print(w1)
print(w2)
print(b)
Training Loop:
learning_rate = 0.2
#costs = []
for x in range(50000):
z = train_data["Length"] * w1 + train_data["Width"] + b
preds = sigmoid(z)
target = train_data["Class"]
cost = np.square(preds - target)
derivcost_pred = 2 * (preds - target)
derivpred_sigp = sigmoid_p(z)
dcost_dz = derivcost_pred * derivpred_sigp
dz_dw1 = train_data["Length"]
dz_dw2 = train_data["Width"]
dz_db = 1
dcost_dw1 = dcost_dz * dz_dw1
dcost_dw2 = dcost_dz * dz_dw2
dcost_db = dcost_dz * dz_db
w1 = w1 - learning_rate * dcost_dw1
w2 = w2 - learning_rate * dcost_dw2
b = b - learning_rate * dcost_db
My question here is how to get the last w1, w2, b value that was trained?
Also, if I'll use the series, how can I access the last value instead?
Lastly, let me know If did something wrong with the loop
For your first question, since you want the last w1, w2, b value that was trained, I will assume that this corresponds to x=50000-1. If this is correct, just add one line to the end of the loop
for x in range(50000):
.
.
.
if x==50000-1: costs.append([w1, w2, b])
# Print results
w1_trained, w2_trained, b_trained = costs[0][0], costs[0][1], costs[0][2]
print(w1_trained, w2_trained, b_trained)
I'd like to initialize an array b similar to this, but faster:
a = [0.53, 0.66, 0.064, 0.94, 0.44]
b = [0.0]*5
x = 14.3
for i in range(5):
x = b[i] = a[i]*0.1 + x*0.9
Is there something in numpy for that purpose?
An ugly but vectorized numpy solution:
import numpy as np
a = np.array([0.53, 0.66, 0.064, 0.94, 0.44])
x = 14.3
idx = np.arange(a.size)
0.9 ** idx * (0.1 * (a * 0.9 ** (-idx)).cumsum() + x * 0.9)
# array([12.923 , 11.6967 , 10.53343 , 9.574087 , 8.6606783])
Result from for loop:
a = [0.53, 0.66, 0.064, 0.94, 0.44]
b = [0.0]*5
x = 14.3
for i in range(5):
x = b[i] = a[i]*0.1 + x*0.9
b
#[12.923000000000002,
# 11.696700000000003,
# 10.533430000000003,
# 9.574087000000002,
# 8.660678300000002]
This is because:
And components in the result can be vectorized correspondingly, note the sum of a terms in the inner parenthesis is vectorized as cumsum.
Maybe we can break down and find the correlation between each steps
***x0=x0***
***x1=a[0]*0.1 + x0*0.9***
***x2=a[1]*0.1 + x1*0.9=a[1]*0.1 + (a[0]*0.1 + x0*0.9)*0.9***
So xn=an*0.1+an-1*0.1*0.9+...+a0*0.1*0.9**n-1+x0*0.9**n
n=np.array(0.9)**np.arange(len(a))
sum(a[::-1]*n)*0.1+x*(0.9**(len(a)))
Out[338]: 8.6606783
Update output array
np.diag(np.fliplr(((a*n[::-1])*0.1).cumsum()/n[:,None]+x*(0.9**(np.arange(1,len(a)+1)))))[::-1]
Out[472]: array([12.923 , 11.6967 , 10.53343 , 9.574087 , 8.6606783])
Let's rewrite your loop to analyze it more closely:
for i in range(1, 5):
b[i] = a[i]*0.1 + b[i-1]*0.9
This makes it clear that the calculation is recursive. That is, the value of b[i] depends on the value of b[i-1]. This means that you cannot vectorize the calculation. Vectorizing requires that each element of the result vector is independent of all other elements.