Linear regression prediction not matching training data

Linear regression prediction not matching training data - python

I am a newbie to machine learning. I am trying a simple prediction using linear regression with "made up" data that follows a specific pattern. For some reason, the prediction is not matching the training data. Can you let me know what I need to fix? The sample code is below
from sklearn import linear_model
import numpy as np
X = np.random.randint(3, size=(3, 1000))
Y = np.random.randint(10, size=(1, 1000))
# f1, f2, f3 - min = 0, max = 2
# f1 = 0 and f2 = 1 then 7 <= Y < 10, irrespective of f3
# f1 = 1 and f2 = 2 Y is 0, irrespective of f3
# f1 = 0 and f2 = 2 if f3 = 2 then 3 <= Y < 7 else Y = 0
for i in range(1000):
if ((X[0][i] == 0 and X[1][i] == 1) or (X[0][i] == 1 and X[1][i] == 0)):
Y[0][i] = np.random.randint(7, 10)
elif ((X[0][i] == 1 and X[1][i] == 2) or (X[0][i] == 2 and X[1][i] == 1)):
Y[0][i] = 0
elif ((X[0][i] == 0 and X[1][i] == 2 and X[2][i] == 2) or
(X[0][i] == 2 and X[1][i] == 0 and X[2][i] == 2)):
Y[0][i] = np.random.randint(3, 7)
else:
Y[0][i] = 0
X1 = X.transpose()
Y1 = Y.reshape(-1, 1)
print zip(X1, Y1)
# create and fit the model
clf = linear_model.LinearRegression()
clf.fit(X1, Y1)
Z = np.array([[0, 0, 0, 0, 1, 1],
[1, 1, 2, 2, 2, 2],
[1, 2, 1, 2, 1, 2]])
Z1 = Z.transpose()
print Z1
y_predict = clf.predict(Z1)
print y_predict

And why would it match the training data? Your X->Y relation is clearly non-linear, and only for perfect linear relation, meaning that Y = AX + b, you can expect linear regression to fit training data perfectly. Otherwise, you can get arbitrary far away from the solution - see for example an Anscombe's quartet (image belowo from wiki).

Related

Remove elements from Numpy array until y has equivalent elements in each value

I have an array y composed of 0 and 1, but at a different frequency.
For example:
y = np.array([0, 0, 1, 1, 1, 1, 0])
And I have an array x of the same length.
x = np.array([0, 1, 2, 3, 4, 5, 6])
The idea is to filter out elements until there are the same number of 0 and 1.
A valid solution would be to remove index 5:
x = np.array([0, 1, 2, 3, 4, 6])
y = np.array([0, 0, 1, 1, 1, 0])
A naive method I can think of is to get the difference between the value frequency of y (in this case 4-3=1) create a mask for y == 1 and switch random elements from True to False until the difference is 0. Then create a mask for y == 0, do a OR between them and apply it to both x and y.
This doesn't really seem the best "python/numpy way" of doing it though.
Any suggestions? Something like randomly select n elements from the highest count, where n is the count of the lowest value.
If this is easier with pandas then that would work for me too.
Naive algorithm assuming 1 > 0:
mask_pos = y == 1
mask_neg = y == 0
pos = len(y[mask_pos])
neg = len(y[mask_neg])
diff = pos-neg
while diff > 0:
rand = np.random.randint(0, len(y))
if mask_pos[rand] == True:
mask_pos[rand] = False
diff -= 1
mask_final = mask_pos | mask_neg
y_new = y[mask_final]
x_new = x[mask_final]
This naive algorithm is really slow

One way to do that with NumPy is this:
import numpy as np
# Makes a mask to balance ones and zeros
def balance_binary_mask(binary_array):
binary_array = np.asarray(binary_array).ravel()
# Count number of ones
z = np.count_nonzero(binary_array)
# If there are less ones than zeros
if z <= len(binary_array) // 2:
# Invert the array
binary_array = ~binary_array
# Find ones
idx = np.nonzero(binary_array)[0]
# Number of elements to remove
rem = 2 * len(idx) - len(binary_array)
# Pick random indices to remove
rem_idx = np.random.choice(idx, size=rem, replace=False)
# Make mask
mask = np.ones_like(binary_array, dtype=bool)
# Mask elements to remove
mask[rem_idx] = False
return mask
# Test
np.random.seed(0)
y = np.array([0, 0, 1, 1, 1, 1, 0])
x = np.array([0, 1, 2, 3, 4, 5, 6])
m = balance_binary_mask(y)
print(m)
# [ True True True True False True True]
y = y[m]
x = x[m]
print(y)
# [0 0 1 1 1 0]
print(x)
# [0 1 2 3 5 6]

How could I modify the formula of A star algorithm to what I need below?

I'm working on A star algorithm. I'm trying to build a trajectory for drone depending on A star. I have implemented my code below. I need to consider the height of obstacles and modify my equation :
F= G+H to F=H+G+E
E: represents the elevation of obstacles. We have the drone is flying in a specific altitude over a map, if the obstacle was very high(it means its risk very high) the distance between the obstacle and the drone is too close, so the drone will prefer to fly over the short obstacle. If the obstacle higher than the altitude of the drone, it will turn around.
I added an elevation map with random height generation and drone_altitude, but it doesn't work with me. Could I get some assistance, please?.
The A-star Python Code:
import numpy
grid = [[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0]]
heuristic = [[9, 8, 7, 6, 5, 4],
[8, 7, 6, 5, 4, 3],
[7, 6, 5, 4, 3, 2],
[6, 5, 4, 3, 2, 1],
[5, 4, 3, 2, 1, 0]]
init = [0,0]
goal = [len(grid)-1,len(grid[0])-1]
delta = [[-1 , 0], #up
[ 0 ,-1], #left
[ 1 , 0], #down
[ 0 , 1]] #right
delta_name = ['^','<','V','>'] #The name of above actions
cost = 1 #Each step costs you one
drone_height = 60
def search():
#open list elements are of the type [g,x,y]
closed = [[0 for row in range(len(grid[0]))] for col in range(len(grid))]
action = [[-1 for row in range(len(grid[0]))] for col in range(len(grid))]
#We initialize the starting location as checked
closed[init[0]][init[1]] = 1
expand=[[-1 for row in range(len(grid[0]))] for col in range(len(grid))]
elvation = numpy.random.randint(0, 100+1, size=(5, 6))
print(elvation)
# we assigned the cordinates and g value
x = init[0]
y = init[1]
g = 0
h = heuristic[x][y]
e = elvation[x][y]
f = g + h + e
#our open list will contain our initial value
open = [[f, g, h, x, y]]
found = False #flag that is set when search complete
resign = False #Flag set if we can't find expand
count = 0
#print('initial open list:')
#for i in range(len(open)):
#print(' ', open[i])
#print('----')
while found is False and resign is False:
#Check if we still have elements in the open list
if len(open) == 0: #If our open list is empty, there is nothing to expand.
resign = True
print('Fail')
print('############# Search terminated without success')
print()
else:
#if there is still elements on our list
#remove node from list
open.sort()
open.reverse() #reverse the list
next = open.pop()
#print('list item')
#print('next')
x = next[3]
y = next[4]
g = next[1]
expand[x][y] = count
count+=1
#Check if we are done
if x == goal[0] and y == goal[1]:
found = True
print(next) #The three elements above this "if".
print('############## Search is success')
print()
else:
#expand winning element and add to new open list
for i in range(len(delta)):
x2 = x + delta[i][0]
y2 = y + delta[i][1]
#if x2 and y2 falls into the grid
if x2 >= 0 and x2 < len(grid) and y2 >=0 and y2 <= len(grid[0])-1:
#if x2 and y2 not checked yet and there is not obstacles
if closed[x2][y2] == 0 and grid[x2][y2] == 0 and e < drone_height:
g2 = g + cost #we increment the cose
h2 = heuristic[x2][y2]
e2 = elvation[x2][y2]
f2 = g2 + h2 + e2
open.append([f2,g2,h2,x2,y2]) #we add them to our open list
#print('append list item')
#print([g2,x2,y2])
#Then we check them to never expand again
closed[x2][y2] = 1
action[x2][y2] = i
for i in range(len(expand)):
print(expand[i])
print()
policy=[[' ' for row in range(len(grid[0]))] for col in range(len(grid))]
x=goal[0]
y=goal[1]
policy[x][y]='*'
while x !=init[0] or y !=init[1]:
x2=x-delta[action[x][y]][0]
y2=y-delta[action[x][y]][1]
policy[x2][y2]= delta_name[action[x][y]]
x=x2
y=y2
for i in range(len(policy)):
print(policy[i])
search()

How to loop through lists from pandas dataframe in a function

Here is my dataframe,
df = pd.DataFrame({'Id': [102,103,104,303,305],'ExpG_Home':[1.8,1.5,1.6,1.8,2.9],
'ExpG_Away':[2.2,1.3,1.2,2.8,0.8],
'HomeG_Time':[[93, 109, 187],[169], [31, 159],[176],[16, 48, 66, 128]],
'AwayG_Time':[[90, 177],[],[],[123,136],[40]]})
First, I need to create an array y, for a given Id number, it takes values from same row (ExpG_Home & ExpG_Away).
y = [1 - (ExpG_Home + ExpG_Away), ExpG_Home, ExpG_Away]
Second, I found this much harder, for the Id used in creating y, the function below takes the corresponding lists from HomeG_Time & AwayG_Time and creates an array. Unfortunately, my function takes one row at a time. I need to do this for a large dataset.
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
total_timeslot = 200 # number of timeslot per game.
k = 1 # constant
#For Id=102 with ExpG_Home=2.2 and ExpG_Away=1.8
HomeG_Time = [93, 109, 187]
AwayG_Time = [90, 177]
y = np.array([1-(2.2 + 1.8)/k, 2.2/k, 1.8/k])
# output of y = [0.98 , 0.011, 0.009]
def squared_diff(x1, x2, x3, y):
ssd = []
for k in range(total_timeslot):
if k in HomeG_Time:
ssd.append(sum((x2 - y) ** 2))
elif k in AwayG_Time:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return ssd
sum(squared_diff(x1, x2, x3, y))
Out[37]: 7.880400000000012
This output is for the first row only.

Here is the complete snippet given,
>>> import numpy as np
>>> x1 = np.array( [1,0,0] )
>>> x2 = np.array( [0,1,0] )
>>> x3 = np.array( [0,0,1] )
>>> total_timeslot = 200
>>> HomeG_Time = [93, 109, 187]
>>> AwayG_Time = [90, 177]
>>> ExpG_Home=2.2
>>> ExpG_Away=1.8
>>> y = np.array( [1 - (ExpG_Home + ExpG_Away), ExpG_Home, ExpG_Away] )
>>> def squared_diff(x1, x2, x3, y):
... ssd = []
... for k in range(total_timeslot):
... if k in HomeG_Time:
... ssd.append(sum((x2 - y) ** 2))
... elif k in AwayG_Time:
... ssd.append(sum((x3 - y) ** 2))
... else:
... ssd.append(sum((x1 - y) ** 2))
... return ssd
...
>>> sum(squared_diff(x1, x2, x3, y))
4765.599999999989
Assuming this. Calculate y as (N,3) using pandas.DataFrame.apply
>>> y = np.array( df.apply(lambda row: [1 - (row.ExpG_Home + row.ExpG_Away),
... row.ExpG_Home, row.ExpG_Away ],
... axis=1).tolist() )
>>> y.shape
(5, 3)
Now calcualte squared error for a given x
>>> def squared_diff(x, y):
... return np.sum( np.square(x - y), axis=1)
In your case, if error2 is squared_diff(x2,y) you are adding this the number of occuerences of HomeG_Time
>>> n3 = df.AwayG_Time.apply(len)
>>> n2 = df.HomeG_Time.apply(len)
>>> n1 = 200 - (n2 + n3)
The final sum of squared error is (as per your calculation)
>>> squared_diff(x1, y) * n1 + squared_diff(x2, y) * n2 + squared_diff(x3, y) * n3
0 4766.4
1 2349.4
2 2354.4
3 6411.6
4 4496.2
dtype: float64
>>>

try this,
import pandas as pd
import numpy as np
df = pd.DataFrame({'Id': [102,103,104,303,305],'ExpG_Home':[1.8,1.5,1.6,1.8,2.9],
'ExpG_Away':[2.2,1.3,1.2,2.8,0.8],
'HomeG_Time':[[93, 109, 187],[169], [31, 159],[176],[16, 48, 66, 128]],
'AwayG_Time':[[90, 177],[],[],[123,136],[40]]})
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
k=1
total_timeslot = 200 # number of timeslot per game.
def squared_diff(x1, x2, x3,AwayG_Time,HomeG_Time, y):
ssd = []
for k in range(total_timeslot):
if k in HomeG_Time:
ssd.append(sum((x2 - y) ** 2))
elif k in AwayG_Time:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return ssd
s=pd.DataFrame( pd.concat([df,1-(df['ExpG_Home']+df['ExpG_Away'])/k,df['ExpG_Home']/k,df['ExpG_Away']/k],axis=1).values)
df['res']=s.apply(lambda x: sum(squared_diff(x1,x2,x3,x[0],x[3],np.array([x[5],x[6],x[7]]))),axis=1)
del s
print df
Output:
AwayG_Time ExpG_Away ExpG_Home HomeG_Time Id res
0 [90, 177] 2.2 1.8 [93, 109, 187] 102 4766.4
1 [] 1.3 1.5 [169] 103 2349.4
2 [] 1.2 1.6 [31, 159] 104 2354.4
3 [123, 136] 2.8 1.8 [176] 303 6411.6
4 [40] 0.8 2.9 [16, 48, 66, 128] 305 4496.2

def squared_diff(row):
y = np.array([1 - (row.ExpG_Home + row.ExpG_Away), row.ExpG_Home, row.ExpG_Away])
HomeG_Time = row.HomeG_Time
AwayG_Time = row.AwayG_Time
x1 = np.array([1, 0, 0])
x2 = np.array([0, 1, 0])
x3 = np.array([0, 0, 1])
total_timeslot = 200
ssd = []
for k in range(total_timeslot):
if k in HomeG_Time:
ssd.append(sum((x2 - y) ** 2))
elif k in AwayG_Time:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return sum(ssd)
df.apply(squared_diff, axis=1)
Out[]:
0 4766.4
1 2349.4
2 2354.4
3 6411.6
4 4496.2

iterating through multiple equations [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a problem simplified into the following:
Xn+1 = Xn + Yn
Yn+1= Yn + Zn
Zn+1= Zn+ Xn
I know the values of X0,Y0,Z0 to be equal to 1.
I want to tell python to find the values of X1,Y1,Z1 and then X2,Y2,Z2,...etc. Can anyone help me with that? I think I have to use a nested loop but I am not sure exactly how to go about it. Thanks!

Are we talking about something as simple as:
x, y, z = 1, 1, 1
for i in range(10):
print("X{i} = {x}, Y{i} = {y}, Z{i} = {z}".format(**locals()))
x, y, z = x + y, y + z, z + x
which doesn't seem right as the output is so uninteresting:
X0 = 1, Y0 = 1, Z0 = 1
X1 = 2, Y1 = 2, Z1 = 2
X2 = 4, Y2 = 4, Z2 = 4
X3 = 8, Y3 = 8, Z3 = 8
X4 = 16, Y4 = 16, Z4 = 16
X5 = 32, Y5 = 32, Z5 = 32
X6 = 64, Y6 = 64, Z6 = 64
X7 = 128, Y7 = 128, Z7 = 128
X8 = 256, Y8 = 256, Z8 = 256
X9 = 512, Y9 = 512, Z9 = 512

Below is the sample function to achieve it:
def solve_equation(n):
X = {0: 1}
Y = {0: 1}
Z = {0: 1}
for i in range(n):
print 'For n: ', i+1
X[i+1] = X[i] + Y[i]
Y[i+1] = Y[i] + Z[i]
Z[i+1] = Z[i] + X[i]
print 'X = ', X[i+1], ' Y = ', Y[i+1], ' Z = ', Z[i+1]
Sample run:
>>> solve_equation(3)
For n: 1
X = 2 Y = 2 Z = 2
For n: 2
X = 4 Y = 4 Z = 4
For n: 3
X = 8 Y = 8 Z = 8

You can use recursive functions. For a faster and better code you can consider using yield
def Cvalue(c,xyz , n):
if n == 0 :
res = c
else :
if xyz == 1: res = Cvalue(c,1, n-1) + Cvalue(c,2, n-1)
elif xyz == 2: res = Cvalue(c,2, n-1) + Cvalue(c,3, n-1)
elif xyz == 3: res = Cvalue(c,3, n-1) + Cvalue(c,1, n-1)
else: print("Error\n")
return res
def XYZvalues(x0, y0, z0, n):
x = Cvalue(x0,1, n)
y = Cvalue(y0,2, n)
z = Cvalue(z0,3, n)
return (x, y , z)
print(XYZvalues(1,1,1, 3))

Magic Squares - Siamese Method

Is it possible to do magic squares with the Siamese/De La Loubere method without using modulo?
I would like to make odd n x n magic squares using it.

Yes, it's possible. Written on Python 3.5:
def siamese_method(n):
assert(n % 2 != 0), 'Square side size should be odd!'
square = [[0 for x in range(n)] for x in range(n)]
x = 0
y = int((n + 1) / 2 - 1)
square[x][y] = 1
for i in range(2, n * n + 1):
x_old = x
y_old = y
if x == 0:
x = n - 1
else:
x -= 1
if y == n - 1:
y = 0
else:
y += 1
while square[x][y] != 0:
if x == n - 1:
x = 0
else:
x = x_old + 1
y = y_old
square[x][y] = i
for j in square:
print(j)
siamese_method(3)
I've got following on output:
[8, 1, 6]
[3, 5, 7]
[4, 9, 2]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Linear regression prediction not matching training data - python

Related

Remove elements from Numpy array until y has equivalent elements in each value

How could I modify the formula of A star algorithm to what I need below?

How to loop through lists from pandas dataframe in a function

iterating through multiple equations [closed]

Magic Squares - Siamese Method

Categories

Resources