I am trying to create a ternary plot of soil types with text over specific areas to declare what type of soil the area represents. I accomplished this, but the contrast is not great, so I wanted to put a box behind the text to help it stand out... the problem is that the coordinates I derived using pen and paper are yielding bizarre results. Here is the relevant code chunk:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
df = pd.DataFrame({"Clay (%)": [], "Silt (%)": [], "Sand (%)": []})
fig = go.Figure(px.scatter_ternary(df, a="Clay (%)", c="Silt (%)", b="Sand (%)"))
# Clay text_bg
a0, a1 = 0.60, 0.60 # bottom left/right vertices
da = 0.09 # change in vertical height
a2, a3 = a1+da, a0+da # top right/left vertices
b0 = 0.26 # bottom left (eyeballed)
b1 = 0.13 # bottom right (eyeballed)
b2 = b1-da # top right
b3 = b0-da # top left
c0 = 0.10 # bottom left (eyeballed)
c1 = 0.24 # bottom right (eyeballed)
c2 = c1-da # top right
c3 = c0-da # top left
fig.add_trace(
go.Scatterternary(a=[a0, a1, a2, a3, a0],
b=[b0, b1, b2, b3, b0],
c=[c0, c1, c2, c3, c0],
mode='lines',
line=dict(width=1.5, color='black'),
showlegend=False,
fill='toself',
fillcolor='rgba(1.0,1.0,1.0,0.5)')
)
fig.show()
Clearly this is just the code for the box around the word "Clay", I can provide the rest of my code if needed.
--> a represents the "Clay (%)" axis
--> b represents the "Sand (%)" axis
--> c represents the "Silt (%)" axis
I try to generate the box by starting at its bottom left vertex and moving counterclockwise. The first two glaring issues, is that I have the base height (a0) set to 0.60, but it looks like it starts closer to 0.63.; and its height should be constant as it moves to the bottom right vertex, but it clearly decreases. Next, the new height should be 0.69, but it looks closer to 0.79 and is again not constant.
Since the axes are separated by angles of 60deg, you can derive my expressions for the formulated 'b' and 'c' vertices with the law of sines -- that is to say the change in 'a' should be equal to the change in 'b' and 'c' when you are moving vertically.
What is going on here?
BTW: Running in a jupyter notebook IPython version 7.31.1, jupyterlab version 3.4.4, plotly version 5.9.0, on a Windows 11 machine.
I am bad at geometry... the issue was that when the value of changes by some distance da, and do not change by da, they change by da/sqrt(3). I still do not understand the output of the code when I input the wrong coordinates... but regardless, here is the corrected code chunk:
# Clay text_bg
a0, a1 = 0.60, 0.60
da = 0.09
d_bc = da/np.sqrt(3)
a2, a3 = a1+da, a0+da
b0 = 0.30
b1 = 0.10
b2 = b1-d_bc
b3 = b0-d_bc
c0 = 0.10
c1 = 0.30
c2 = c1-d_bc
c3 = c0-d_bc
fig.add_trace(
go.Scatterternary(a=[a0, a1, a2, a3, a0],
b=[b0, b1, b2, b3, b0],
c=[c0, c1, c2, c3, c0],
mode='lines',
line=dict(width=1.5, color='black'),
showlegend=False,
fill='toself',
fillcolor='rgba(1.0,1.0,1.0,0.5)')
)
Related
Based on H2O's documentation it would seem as though relevel('most_frequency_category') and relevel_by_frequency() should accomplish the same thing. However the coefficient estimates are different depending on which method is used to set the reference level for a factor column.
Using an open source dataset from sklearn demonstrates how the GLM coefficients are misaligned when the base level is set using the two releveling methods. Why do the coefficient estimates vary when the base level is the same between the two models?
import pandas as pd
from sklearn.datasets import fetch_openml
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init(max_mem_size=8)
def load_mtpl2(n_samples=100000):
"""
Fetch the French Motor Third-Party Liability Claims dataset.
https://scikit-learn.org/stable/auto_examples/linear_model/plot_tweedie_regression_insurance_claims.html
Parameters
----------
n_samples: int, default=100000
number of samples to select (for faster run time). Full dataset has
678013 samples.
"""
# freMTPL2freq dataset from https://www.openml.org/d/41214
df_freq = fetch_openml(data_id=41214, as_frame=True)["data"]
df_freq["IDpol"] = df_freq["IDpol"].astype(int)
df_freq.set_index("IDpol", inplace=True)
# freMTPL2sev dataset from https://www.openml.org/d/41215
df_sev = fetch_openml(data_id=41215, as_frame=True)["data"]
# sum ClaimAmount over identical IDs
df_sev = df_sev.groupby("IDpol").sum()
df = df_freq.join(df_sev, how="left")
df["ClaimAmount"].fillna(0, inplace=True)
# unquote string fields
for column_name in df.columns[df.dtypes.values == object]:
df[column_name] = df[column_name].str.strip("'")
return df.iloc[:n_samples]
df = load_mtpl2()
df.loc[(df["ClaimAmount"] == 0) & (df["ClaimNb"] >= 1), "ClaimNb"] = 0
df["Exposure"] = df["Exposure"].clip(upper=1)
df["ClaimAmount"] = df["ClaimAmount"].clip(upper=100000)
df["PurePremium"] = df["ClaimAmount"] / df["Exposure"]
X_freq = h2o.H2OFrame(df)
X_freq["VehBrand"] = X_freq["VehBrand"].asfactor()
X_freq["VehBrand"] = X_freq["VehBrand"].relevel_by_frequency()
X_relevel = h2o.H2OFrame(df)
X_relevel["VehBrand"] = X_relevel["VehBrand"].asfactor()
X_relevel["VehBrand"] = X_relevel["VehBrand"].relevel("B1") # most frequent category
response_col = "PurePremium"
weight_col = "Exposure"
predictors = "VehBrand"
glm_freq = H2OGeneralizedLinearEstimator(family="tweedie",
solver='IRLSM',
tweedie_variance_power=1.5,
tweedie_link_power=0,
lambda_=0,
compute_p_values=True,
remove_collinear_columns=True,
seed=1)
glm_relevel = H2OGeneralizedLinearEstimator(family="tweedie",
solver='IRLSM',
tweedie_variance_power=1.5,
tweedie_link_power=0,
lambda_=0,
compute_p_values=True,
remove_collinear_columns=True,
seed=1)
glm_freq.train(x=predictors, y=response_col, training_frame=X_freq, weights_column=weight_col)
glm_relevel.train(x=predictors, y=response_col, training_frame=X_relevel, weights_column=weight_col)
print('GLM with the reference level set using relevel_by_frequency()')
print(glm_freq._model_json['output']['coefficients_table'])
print('\n')
print('GLM with the reference level manually set using relevel()')
print(glm_relevel._model_json['output']['coefficients_table'])
Output
GLM with the reference level set using relevel_by_frequency()
Coefficients: glm coefficients
names coefficients std_error z_value p_value standardized_coefficients
------------ -------------- ----------- ---------- ----------- ---------------------------
Intercept 5.40413 1.24082 4.35531 1.33012e-05 5.40413
VehBrand.B2 -0.398721 1.2599 -0.316472 0.751645 -0.398721
VehBrand.B12 -0.061573 1.46541 -0.0420176 0.966485 -0.061573
VehBrand.B3 -0.393908 1.30712 -0.301356 0.763144 -0.393908
VehBrand.B5 -0.282484 1.31929 -0.214118 0.830455 -0.282484
VehBrand.B6 -0.387747 1.25943 -0.307876 0.758177 -0.387747
VehBrand.B4 0.391771 1.45615 0.269047 0.787894 0.391771
VehBrand.B10 -0.0542706 1.35049 -0.040186 0.967945 -0.0542706
VehBrand.B13 -0.306381 1.4628 -0.209449 0.834098 -0.306381
VehBrand.B11 -0.435297 1.29155 -0.337035 0.736091 -0.435297
VehBrand.B14 -0.304243 1.34781 -0.225732 0.821411 -0.304243
GLM with the reference level manually set using relevel()
Coefficients: glm coefficients
names coefficients std_error z_value p_value standardized_coefficients
------------ -------------- ----------- ---------- ---------- ---------------------------
Intercept 5.01639 0.215713 23.2549 2.635e-119 5.01639
VehBrand.B10 0.081366 0.804165 0.101181 0.919407 0.081366
VehBrand.B11 0.779518 0.792003 0.984237 0.325001 0.779518
VehBrand.B12 -0.0475497 0.41834 -0.113663 0.909505 -0.0475497
VehBrand.B13 0.326174 0.80891 0.403227 0.686782 0.326174
VehBrand.B14 0.387747 1.25943 0.307876 0.758177 0.387747
VehBrand.B2 -0.010974 0.306996 -0.0357465 0.971485 -0.010974
VehBrand.B3 -0.00616108 0.464188 -0.0132728 0.98941 -0.00616108
VehBrand.B4 0.333477 0.575082 0.579877 0.561999 0.333477
VehBrand.B5 0.105263 0.497431 0.211613 0.832409 0.105263
VehBrand.B6 0.0835042 0.568769 0.146816 0.883278 0.0835042
The two datasets are almost the same except at one place:
In the first dataset, number of rows for VehBrand with B1 = 72
In the second dataset, number of rows for VehBrand with B14 = 721.
If you look and compare the two datasets, you can map the equivalent names to the number of rows in the two dataset as follows:
Freq B2 == Relevel B2 with 26500 rows
Freq B12 == Relevel B13 with 1883 rows
Freq B3 == Relevel B3 with 8260 rows
Freq B5 == Relevel B5 with 6053 rows
Freq B6 == Relevel B1 with 27240 rows
Freq B4 == Relevel B11 with 1774 rows
Freq B10 == Relevel B4 with 3968 rows
Freq B13 == Relevel B10 with 2268 rows
Freq B11 == Relevel B12 with 16619 rows
Freq B14 == Relevel B6 with 4714 rows.
Since you are training the two GLM models with different datasets, you will get different coefficients and different prediction results.
I would need to plot the step responses of a MIMO system with the python control package.
I've tried so far by using the function step_response, that however converts the system into a SISO before computing the step response, so that only one set of output is computed.
I then tried using the function forced_response with different setup for the input (i.e. constant unity value, numpy array of ones etc..., just for the sake of trying).
I get different step responses, so related to other output, but not all the responses (i.e. number of input x number of output).
Here is a minimum sample code that implements a simple 2nd order model with 2 input and 4 output and dummy data. In attachment a plot of the responses I get.
stepResponses
In my test I first run the step_response function, yout results to be of size 4 x size_time (so only the first 4 output are excited).
Then I run the forced_response function, and youtForced still results of size 4 x size_time, instead of size 4 x size_time x 2 (or similar) as I expected (in the hypothesis forced_response treats the system as a MIMO).
Is there a way to have full control of the step response via the forced_response function (similarly to what the MATLAB step function does)?
Unfortunately there is poor documentation and very few practical examples about this.
Many thanks to who can help.
from control import ss, step_response, forced_response
import numpy as np
import matplotlib.pyplot as plt
sz = 2
f1 = 1*2*np.pi
f2 = 1.5*2*np.pi
OM2 = [-f1**2, -f2**2]
ZI = [-2*f1*0.01, -2*f2*0.01]
A11 = np.zeros((sz, sz))
A12 = np.eye(sz)
A21 = np.diag(OM2)
A22 = np.diag(ZI)
A = np.vstack((np.concatenate((A11, A12), axis=1), np.concatenate((A21, A22), axis=1)))
B1 = np.zeros((sz, sz))
B2 = [[1e-6, 1e-7],[2e-6, 2e-7]]
B = np.vstack((B1, B2))
C1 = np.zeros((sz, sz*2))
C1[0] = [1e-4, 2*1e-4, 3*1e-4, 5*1e-5]
C1[1] = [2e-4, 3.5*1e-4, 1.5*1e-4, 2*1e-5]
C2 = np.zeros((sz*2, sz))
C = np.concatenate((C1.T, C2), axis=1)
D = np.zeros((sz*2, sz))
sys = ss(A, B, C, D)
tEnd = 1
time = np.arange(0, tEnd, 1e-3)
tout, youtStep = step_response(sys, T=time)
tout, youtForced, xout = forced_response(sys, T=time, U=1.0)
plt.figure()
for k, y in enumerate(youtStep):
plt.subplot(4,1,k+1)
plt.grid(True)
plt.plot(tout, y,label='step')
plt.plot(tout, youtForced[k], '--r',label='forced')
if k == 0:
plt.legend()
plt.xlabel('Time [s]')
OK the step response is easily manageable via the function control.matlab.step which actually allows the selection of the different input of the MIMO system, something I initially ignored, but was well reported in the official documentation:
https://python-control.readthedocs.io/en/0.8.1/generated/control.matlab.step.html
Here's the output [MIMO step response output]
Luckily it was an easy fix :)
from control import ss
import control.matlab as ctl
import numpy as np
import matplotlib.pyplot as plt
sz = 2
f1 = 1*2*np.pi
f2 = 1.5*2*np.pi
OM2 = [-f1**2, -f2**2]
ZI = [-2*f1*0.01, -2*f2*0.01]
A11 = np.zeros((sz, sz))
A12 = np.eye(sz)
A21 = np.diag(OM2)
A22 = np.diag(ZI)
A = np.vstack((np.concatenate((A11, A12), axis=1), np.concatenate((A21, A22), axis=1)))
B1 = np.zeros((sz, sz))
B2 = [[1e-6, 1e-7],[2e-6, 2e-7]]
B = np.vstack((B1, B2))
C1 = np.zeros((sz, sz*2))
C1[0] = [1e-4, 2*1e-4, 3*1e-4, 5*1e-5]
C1[1] = [2e-4, 3.5*1e-4, 1.5*1e-4, 2*1e-5]
C2 = np.zeros((sz*2, sz))
C = np.concatenate((C1.T, C2), axis=1)
D = np.zeros((sz*2, sz))
sys = ss(A, B, C, D)
tEnd = 100
time = np.arange(0, tEnd, 1e-3)
yy1, tt1 = ctl.step(sys, T=time, input=0)
yy2, tt2 = ctl.step(sys, T=time, input=1)
plt.figure()
for k in range(0, len(yy1[1,:])):
plt.subplot(4,1,k+1)
plt.grid(True)
plt.plot(tt1, yy1[:,k], label='input=0')
plt.plot(tt2, yy2[:,k], label='input=1')
if k == 0:
plt.legend()
plt.xlabel('Time [s]')
I am writing a code to calculate the shortest distance between two sets of points. Essentially, I have created a csv with a bunch of locations in coordinates, and a second csv with a second bunch of locations in coordinates. For example, the coordinates in list A could be (50, -10), (60, 70), (40, -19) and in list B, it could be (40, 87), (60, 90), (23, 20). Everything I have found online to help me calculates between a list and a single point: this won't work for me.
So far I am able to calculate the distance between all the points (so between A1 and B1, A1 and B2, A1 and B3, A2 and B1, etc). That's fine, but what I want is the minimum distance from point 1 in list A to ANY point in list B. Essentially, what position in list B is closest to each point in list A?
I'm trying to find a way to run it so it checks A1 against B1, B2, B3 etc, and then comes back with the shortest distance being x miles between A1 and B3, for example.
What I have so far is below:
import pandas as pd
import geopy.distance
df = pd.read_csv('AirportCoords.csv')
df2 = pd.read_csv('HotelCoords.csv')
for i,row in df2.iterrows():
coordinate = row.lat, row.long
for i,row in df.iterrows():
coordinate2 = row.latitude, row.longitude
distance = geopy.distance.geodesic(coordinate, coordinate2).km
print(distance)
You're talking about comparing every element of A to every element of B, this implies that you should have a nested loop, but your example code actually has 2 loops in sequence.
import pandas as pd
import geopy.distance
df = pd.read_csv('AirportCoords.csv')
df2 = pd.read_csv('HotelCoords.csv')
for i,row in df.iterrows(): # A
a = row.latitude, row.longitude
distances = []
for j,row2 in df2.iterrows(): # B
b = row2.lat, row2.long
distances.append(geopy.distance.geodesic(a, b).km)
min_distance = min(distances)
min_index = distances.index(min_distance)
print("A", i, "is closest to B", min_index, min_distance, "km")
I am trying to plot a specific course (acceleration over time) using matplotlib. The plot works so far and is being shown (see image). J equals 35 and represents the derivative of acceleration over time (which in this case is a constant).
import numpy as np
import matplotlib.pyplot as plt
def limits_acc_course():
limits_acc_course.t1 = 0.14285714285714285
limits_acc_course.t2 = 0.14285714285714285 + 0.10714285714285715
limits_acc_course.t3 = 2*0.14285714285714285 + 0.10714285714285715
limits_acc_course.t4 = 2*0.14285714285714285 + 0.10714285714285715 + 0.5*0.24714285714285716
limits_acc_course()
t_end = 2*limits_acc_course.t4
t_1 = np.linspace(0, limits_acc_course.t1)
t_2 = np.linspace(limits_acc_course.t1, limits_acc_course.t2)
t_3 = np.linspace(limits_acc_course.t2, limits_acc_course.t3)
t_4 = np.linspace(limits_acc_course.t3, limits_acc_course.t4)
tk1 = np.array([])
tk2 = np.array([])
tk3 = np.array([])
tk4 = np.array([])
for value1 in t_1:
tk1 = np.append(tk1, value1*j)
for value2 in t_2:
tk2 = np.append(tk2, limits_acc_course.t1*j)
for value3 in t_3:
tk3 = np.append(tk3, (limits_acc_course.t3-value3)*j)
for value4 in t_4:
tk4 = np.append(tk4, value4*0)
if value4 == (2*limits_acc_course.t4-limits_acc_course.t3)*j:
break
t = np.concatenate((tk1, tk2, tk3, tk4), axis=0)
t_neg = (-1)*np.concatenate((tk1, tk2, tk3), axis=0)
t_final = np.concatenate((t, t_neg), axis=0)
t_range = np.linspace(0, t_end, t_final.size)
fig, t = plt.subplots()
t.plot(t_range, t_final)
t.get_xaxis().get_major_formatter().set_useOffset(False)
plt.show()
The problem is that the x-coordinates in plot do not match the calculated values.
The x-values in the plot (see image)) should be:
0.142857142857 0.25
(Or at least with such an accuracy:0.1429)
The x-values in the plot are.
0.144777 0.295348
I have tried to turn off the offset and i have played with range from 100 to 2500 values for each part and I have tried to round the values but it didn't work either. Further I have tried using endpoint=False in creating the ranges t_1 to t_4.
By now I ran out of ideas.
enter image description here
The plot is created in an axes which will extent over ~500 pixels on screen. The x axis shows 1.1 units. Hence you have 1.1/500 = 0.0022 units per pixel. The mouse cursor cannot know its position more accurate than 1 pixel. Hence the coordiante shown by the mouse cursor is accurate to ~±0.0022 units.
The observed coordinate (0.144777) deviates from the actual coordinate (0.142857142857) by 0.0019 units, which is well within the accuracy of the cursor.
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy import interpolate
m_c,p_s,complete = np.loadtxt('File1.txt',usecols=(1,0,2),unpack=True)
p_d,m_d = np.loadtxt('File2.txt',usecols=(2,3),unpack=True)
p_c,m_c = np.loadtxt('File3.txt',usecols=(1,2),unpack=True)
def function_oc(m_c,p_c,complete,min,max):
average = 0
comp = []
x = 0
while x<8000:
if p_c[x]<50 and m_c[x]>=min and m_c[x]<=max:
comp.append(complete[x])
x+=1
average = sum(comp)/len(comp)
return average
average1 = function_oc(m_c,p_c,complete,3,10)
average2 = function_oc(m_c,p_c,complete,10,30)
average3 = function_oc(m_c,p_c,complete,30,100)
average4 = function_oc(m_c,p_c,complete,100,300)
average5 = function_oc(m_c,p_C,complete,300,1000)
def function_pc(m_d,p_d,m_c,p_c,complete):
f= interpolate.interp2d(m_c,p_c,complete)
comp_d = f(p_d,m_d)
return comp_d
comp_d = function_pc(m_d,p_d,m_c,p_c,complete)
def function_d(p_d,m_d,min,max):
d = 0
i = 0
while i<33:
if p_d[i]<50 and m_d[i]>=min and m_d[i]<=max:
d+=1
i+=1
return d
d1 = function_d(p_d,m_d,3,10)
d2 = function_d(p_d,m_d,10,30)
d3 = function_d(p_d,ms_d,30,100)
d4 = function_d(p_d,m_d,100,300)
d5 = function_d(p_d,m_d,300,1000)
def function_c(p_c,m_c,min,max):
c = 0
y = 0
while y<12:
if p_c[y]<50 and m_C[y]>=min and m_C[y]<=max:
c+=1
y+=1
return c
c1 = function_c(p_c,m_c,3,10)
c2 = function_c(p_c,m_c,10,30)
c3 = function_c(p_c,m_c,30,100)
c4 = function_c(p_C,m_c,100,300)
c5 = function_c(p_C,m_c,300,1000)
####Missed planets in each bin####
def function_m(c_d,p_d,m_d,min,max):
m=0
for mi in range(len(comp_d)):
if p_d[mi]<50 and m_d[mi]>=min and ms_d[mi]<=max:
m += 1/comp_d[mi] - 1
return m
m1 = function_m(comp_d,p_d,m_d,3,10)
m2 = function_m(comp_d,p_dd,m_d,10,30)
m3 = function_m(comp_d,p_d,m_d,30,100)
m4 = function_m(comp_d,p_d,m_d,100,300)
m5 = function_m(comp_d,p_d,m_d,300,1000)
occ1 = (d1+c1+m1)/average1
occ2 = (d2+c2+m2)/average2
occ3 = (d3+c3+m3)/average3
occ4 = (d4+c4+m4)/average4
occ5 = (d5+c5+m5)/average5
N = 5
dp = (d1, d2, d3, d4, d5)
cp = (c1, c2, c3, c4, c5)
mp = (m1, m2, m3, m4, m5)
planets = (dp, cp, mp)
ind = np.arange(N)
width = 0.9
p1 = plt.bar(ind, dp, width, color='red')
p2 = plt.bar(ind, cp, width, color='blue', bottom=dp)
p3 = plt.bar(ind, mp, width, color='yellow', bottom=[i+j for i,j in zip(dp, cp)])
plt.legend((p1[0], p2[0], p3[0]), ('DP', 'CP', 'MP'))
plt.show()
I don't understand why I get this error for my code:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The line in the code that is causing this issue is:
p3 = plt.bar(ind, mp, width, color='yellow', bottom=[i+j for i,j in zip(dp, cp)])
This error arises when you do something like:
if a < b:
when a or b is an array.
I can't trace where this might be since I don't have your input text files (and you haven't provided the full error trace), but you have a lot of if statements that are potential culprits.
The problem is that a < b in the case of an array resolves to an array of boolean values, for example,
array([True, True, False])
which the if can't parse. np.any and np.all will parse the array of booleans to, as per my example, True for np.any and False for np.all.
You have to use:
np.logical_or(a,b)
np.logical_and(a,b)
for np arrays. It works really well for me!
You are getting this error because you are trying to plot an array versus a point using plt.bar. I.e. you are trying to plot ind[0] versus dp[0] = dp1, which is an array.
If you want to do this, you should use plt.bar for every point in the array.
You should use plt.bar for every element in each element in dp, so for dp[i][j].