Python using row index as variable input to equation within numpy array - python

I can't figure out how to in python without creating a for loop. I'm hoping you can teach me the simpler way.
I trimmed the relevant stuff. I'm doing a polyfit and then want to use these a and b coefficients, coeff[0:1], to update an array and solve the relevant y's like: y = ax + b
I can brute force it and included two methods here, but they're both clunky.
import numpy as np
raw = [0, 3, 6, 8, 11, 15]
coeff = np.polyfit(np.arange(0, len(raw)), raw[:], 1) #fits slope of values in raw
fit = np.zeros(shape=(len(raw), 2))
fit[:,0] = np.arange(0,fit.shape[0]) # this creates an index so I can use the row index as the "x" variable
fit[:,1] = fit[:,0]*coeff[0] + fit[:,0]*coeff[1] # calculating y = ax * b in column [1]
## Alternate method with the for loop
for_fit = np.zeros(len(raw))
for i in range(0,len(raw)) :
for_fit[i] = i*coeff[0] + i*coeff[1]

I tried to make it a little bit cleaner. The main issue I saw is that you did not use the formula y = ax+b but rather y=ax+bx.
import numpy as np
raw = [0, 3, 6, 8, 11, 15]
x = np.arange(0, len(raw))
coeff = np.polyfit(x, raw[:], 1)
y = x*coeff[0] + coeff[1]
To visualise the result we can use:
import matplotlib.pyplot as plt
plt.plot(x, raw, 'bo')
plt.plot(x, y, 'r')
#EDIT
Are you looking for something like this?
y_arr = np.empty((10, len(x)))
for i in range(10):
...
y_arr[i] = y

Related

How to rotate a 1D line graph array in python/numpy by angle?

I’d like to rotate a line graph horizontally. So far, I have the target angle but I’m not able to rotate the graph array (the blue graph in the blot).
import matplotlib.pyplot as plt
import numpy as np
x = [5, 6.5, 7, 8, 6, 5, 3, 4, 3, 0]
y = range(len(x))
best_fit_line = np.poly1d(np.polyfit(y, x, 1))(y)
angle = np.rad2deg(np.arctan2(y[-1] - y[0], best_fit_line[-1] - best_fit_line[0]))
print("angle: " + str(angle))
plt.figure(figsize=(8, 6))
plt.plot(x)
plt.plot(best_fit_line, "--", color="r")
plt.show()
The target calculations of the array should look like this (please ignore the red line):
If you have some advice, please let me know. Thanks.
This question is very helpful, in particular the answer by #Mr Tsjolder. Adapting that to your question, I had to subtract 90 from the angle you calculated to get the result you want:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import transforms
x = [5, 6.5, 7, 8, 6, 5, 3, 4, 3, 0]
y = range(len(x))
best_fit_line = np.poly1d(np.polyfit(y, x, 1))(y)
angle = np.rad2deg(np.arctan2(y[-1] - y[0], best_fit_line[-1] - best_fit_line[0]))
print("angle: " + str(angle))
plt.figure(figsize=(8, 6))
base = plt.gca().transData
rotation = transforms.Affine2D().rotate_deg(angle - 90)
plt.plot(x, transform = rotation + base)
plt.plot(best_fit_line, "--", color="r", transform = rotation + base)
Follow-up question: What if we just need the numerical values of the rotated points?
Then the matplotlib approach can still be useful. From the rotation object we introduced above, matplotlib can extract the transformation matrix, which we can use to transform any point:
# extract transformation matrix from the rotation object
M = transforms.Affine2DBase.get_matrix(rotation)[:2, :2]
# example: transform the first point
print((M * [0, 5])[:, 1])
[-2.60096617 4.27024297]
The slicing was done to get the dimensions we're interested in, since the rotation happens only in 2D. You can see that the first point from your original data gets transformed to (-2.6, 4.3), agreeing with my plot of the rotated graph above.
In this way you can rotate any point you're interested in, or write a loop to catch them all.
Arne's awnser is perfect if you like to rotate the graph with matplotlib. If not, you can take a look a this code:
import matplotlib.pyplot as plt
import numpy as np
def rotate_vector(data, angle):
# source:
# https://datascience.stackexchange.com/questions/57226/how-to-rotate-the-plot-and-find-minimum-point
# make rotation matrix
theta = np.radians(angle)
co = np.cos(theta)
si = np.sin(theta)
rotation_matrix = np.array(((co, -si), (si, co)))
# rotate data vector
rotated_vector = data.dot(rotation_matrix)
return rotated_vector
x = [5, 6.5, 7, 8, 6, 5, 3, 4, 3, 0]
y = range(len(x))
best_fit_line = np.poly1d(np.polyfit(y, x, 1))(y)
angle = np.rad2deg(np.arctan2(y[-1] - y[0], best_fit_line[-1] - best_fit_line[0]))
print("angle:", angle)
# rotate blue line
d = np.hstack((np.vstack(y), np.vstack(x)))
xr = rotate_vector(d, -(angle - 90))
# rotate red line
dd = np.hstack((np.vstack(y), np.vstack(best_fit_line)))
xxr = rotate_vector(dd, -(angle - 90))
plt.figure(figsize=(8, 6))
plt.plot(xr[:, 1]) # or plt.plot(xr[:, 0], xr[:, 1])
plt.plot(xxr[:, 1], "--", color="r")
plt.show()

Looping in Python for a beginner

I am new to coding and looking for a simple way to implement a loop in python. Here is an example of my code! I need to define variables u,v,w etc. from 1 through to 12 to carry out my regression analysis, hence why a loop would be ideal. Thanks!
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm
dataset = pd.read_csv("MultipleRegression.csv")
x1 = np.append(arr = np.ones((4, 1)).astype(int), values = x1, axis = 1)
x_opt1 = x1[:, [0, 1, 2, 3, 4, 5, 6]]
regressor_OLS1 = sm.OLS(endog = y1, exog = x_opt1).fit()
regressor_OLS1.summary()
u1 = regressor_OLS1.params[1]
v1 = regressor_OLS1.params[2]
w1 = regressor_OLS1.params[3]
x1 = regressor_OLS1.params[4]
y1 = regressor_OLS1.params[5]
z1 = regressor_OLS1.params[6]
In Python you can do that without a loop, just unpack the parameters:
u1,v1 ,w1 ,x1 ,y1 ,z1, *rest = regressor_OLS1.params

Putting a gap/break in a pyplot line plot without losing data

I have a time series with several large data gaps. I would like to see a connecting line between data points that are less than an hour apart, but not if the gap is larger. The accepted answer to the question, Put a gap/break in a line plot, would work except that you sacrifice the masked points. I would like to avoid that.
I have attempted to make a list comprehension that would insert NaNs into the array, I think that would automatically achieve the same result, but I don't seem to be able to do it correctly. The best I have found is as follows:
import datetime as dtm
import numpy as np
x = np.array([dtm.datetime(2001,4,3,0,47,30),dtm.datetime(2001,4,3,0,52,30),dtm.datetime(2001,4,3,0,57,30),dtm.datetime(2001,4,3,3,57,30),dtm.datetime(2001,4,3,4,2,30),dtm.datetime(2001,4,3,4,7,30)])
xmod = np.array([x[0]]+[dt1 if dt1-dt0 < dtm.timedelta(hours=1.) else [dt1,np.nan] for dt1, dt0 in zip(x[1:],x[:-1])])
This gives the result:
In [7]: xmod
Out[7]:
array([datetime.datetime(2001, 4, 3, 0, 47, 30),
datetime.datetime(2001, 4, 3, 0, 47, 30),
datetime.datetime(2001, 4, 3, 0, 52, 30),
[datetime.datetime(2001, 4, 3, 0, 57, 30), nan],
datetime.datetime(2001, 4, 3, 3, 57, 30),
datetime.datetime(2001, 4, 3, 4, 2, 30)], dtype=object)
I have not been able to find a way to insert both the data point and the np.nan without putting brackets around them. Is this possible? Is there a better way to achieve my goal? Thanks!
In accordance with the comment above, probably the easiest way to do this would be to separate the data into groups where you need the gaps. Here is one way to implement such a thing.
import datetime as dtm
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
x = np.array([dtm.datetime(2001,4,3,0,47,30),dtm.datetime(2001,4,3,0,52,30),dtm.datetime(2001,4,3,0,57,30),
dtm.datetime(2001,4,3,3,57,30),dtm.datetime(2001,4,3,4,2,30),dtm.datetime(2001,4,3,4,7,30)])
y = range(len(x))
# make a dataframe with groups separated that are over an hour apart
data = []
g = 0
for i in range(len(x)):
x0 = x[i]
y0 = y[i]
if i < (len(x)-1):
x1 = x[i+1]
td = x1 - x0
elapsed_seconds = td.total_seconds()
hrs = (elapsed_seconds/60)/60
if hrs < 1:
data.append([x0,y0, g])
else:
data.append([x0,y0, g])
g+=1
else:
data.append([x0,y0, g])
df = pd.DataFrame(data, columns=['x', 'y', 'group'])
# draw a plot
fig, ax = plt.subplots(1,1, figsize = (8,5))
for i, dfg in df.groupby('group'):
ax.plot(dfg['x'], dfg['y'], c='b')
So, I accepted the answer by djakubosky because it seems clean and is probably the right approach. However, by the time that answer was posted, I had decided that what I was doing was inappropriate for a list comprehension and simply wrote it as a for loop - and that worked fine. Possibly this will be useful to someone else. Here is the code:
def insert_breaks(x,y):
import datetime as dtm
import numpy as np
xnew = []
ynew = []
for dt1, dt0, y1, y0 in zip(x[1:],x[:-1],y[1:],y[:-1]):
if dt1-dt0 < dtm.timedelta(hours=1):
xnew+=[dt0]
ynew+=[y0]
else:
xnew+=[dt0,dt0+(dt1-dt0)/2]
ynew+=[y0, np.nan]
xnew+=[dt1]
ynew+=[y1]
return xnew, ynew

MatPlotLib: Scatter with multiple y values to one x value, and regression lines

I would like to create a scatter plot in matplotlib to measure the performance of my algorithm.
An example of my data is as follows:
x = [1, 2, 3, 4, 5]
y1 = [1, 2, 3] # corresponding to x = 1
y2 = [4, 5, 6] # corresponding to x = 2
y3 = [7, 8, 9] # corresponding to x = 3
y4 = [10, 11, 12] # corresponding to x = 4
y5 = [13, 14, 15] # corresponding to x = 5
What data type would be best to represent multiple y values with one x value?
In my example the relation is exponential. Is there a way to plot an exponential regression line in matplotlib?
I think it is related with the data analyses. If I understand correctly, I think you want to have a comparison with every test's time efficiency, but at each test run, they should be at the same test environments (like the same machine, the same input data, etc.) So just give a suggestion, you can use each test's average run time as the standard value to show your test results. Here is some code you can use.
import numpy as np
import matplotlib.pyplot as plt
data_dim = 4 # number of test
data_points = 100 # number of each test_data_points
data_set = np.random.rand(data_dim,data_points)
time = [ list(range(len(i))) for i in data_set]
norm = np.full((data_dim,data_points),1)
aver = [] # get each test's average value
ndx = 0
for i in norm:
aver.append(i* sum(data_set[0]) / data_points)
fig = plt.figure(figsize=(10,10))
ndx = 1
for i in range(0,2):
for j in range(0,2):
ax = fig.add_subplot(2,2,ndx)
ax.plot(time[ndx-1],data_set[ndx-1],'ko')
ax.plot(time[ndx-1],aver[ndx-1],'r')
ax.set_ylim(-1,2)
ndx += 1
plt.show()
The following is the run result. Note, the red solid line is the average of your test time, which will give some senses of your each test.

Creating a Smooth Line based on Points

I have the following dataset:
x = [1, 6, 11, 21, 101]
y = [5, 4, 3, 2, 1]
and my goal is to create a smooth curve that looks like this:
Is there a way to do it in Python?
I have attempted using the method shown in here, and here is the code:
from scipy.interpolate import spline
import matplotlib.pyplot as plt
import numpy as np
x = [1, 6, 11, 21, 101]
y = [5, 4, 3, 2, 1]
xnew = np.linspace(min(x), max(x), 100)
y_smooth = spline(x, y, xnew)
plt.plot(xnew, y_smooth)
plt.show()
but the output shows a weird line.
First, interpolate.spline() has been deprecated, so you should probably not use that. Instead use interpolate.splrep() and interpolate.splev(). It's not a difficult conversion:
y_smooth = interpolate.spline(x, y, xnew)
becomes
tck = interpolate.splrep(x, y)
y_smooth = interpolate.splev(xnew, tck)
But, that's not really the issue here. By default, scipy tries to fit a polynomial of degree 3 to your data, which doesn't really fit your data. But since there's so few points, it can fit your data fairly well even though it's a non-intuitive approximation. You can set the degree of polynomial that it tries to fit with a k=... argument to splrep(). But the same is true even of a polynomial of degree 2; it's trying to fit a parabola, and your data could possibly fit a parabola where there is a bow in the middle (which is what it does now, since the slope is so steep at the beginning and there's no datapoints in the middle).
In your case, your data is much more accurately represented as an exponential, so it'd be best to fit an exponential. I'd recommend using scipy.optimize.curve_fit(). It lets you specify your own fitting function which contains parameters and it'll fit the parameters for you:
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
x = [1, 6, 11, 21, 101]
y = [5, 4, 3, 2, 1]
xnew = np.linspace(min(x), max(x), 100)
def expfunc(x, a, b, c):
return a * np.exp(-b * x) + c
popt, pcov = curve_fit(expfunc, x, y)
plt.plot(xnew, expfunc(xnew, *popt))
plt.show()

Categories

Resources