X =
[[14.23 3.06 5.64 2.43]
[13.2 2.76 4.38 2.14]
[13.16 3.24 5.68 2.67]
[14.37 3.49 7.8 2.5 ]
[13.24 2.69 4.32 2.87]
[14.2 3.39 6.75 2.45]
[14.39 2.52 5.25 2.45]
[14.06 2.51 5.05 2.61]
[14.83 2.98 5.2 2.17]
[13.86 3.15 7.22 2.27]
[14.1 3.32 5.75 2.3 ]
[14.12 2.43 5. 2.32]
[13.75 2.76 5.6 2.41]
[14.75 3.69 5.4 2.39]
[14.38 3.64 7.5 2.38]
[13.63 2.91 7.3 2.7 ]
[14.3 3.14 6.2 2.72]
[13.83 3.4 6.6 2.62]
[14.19 3.93 8.7 2.48]
[13.64 3.03 5.1 2.56]]
Here is my dataset. Now I want to calculate the Euclidean distance for 2 of vectors (rows).
Row1 = X[1]
Row2 = X[2]
My function:
def Edistance (v1, v2):
distance = 0.0
for i in range(len(v1)-1):
distance += (v1(i)) - (v2(i))**2
return sqrt(distance)
Edistance(Row1,Row2)
I then get Typerror: NumPy array is not callable. Can I not use an array in my functions input?
You can pass any object as a function argument and so you can pass arrays, but as #xdurch0 mentioned earlier, your syntax is wrong.
def Edistance (v1: dict, v2: dict): # You
distance = 0.0
for i in range(len(v1)-1):
distance += (v1(i)) - (v2(i))**2
return sqrt(distance)
What you try to do here is to call v1 and v2 as if they were a functions, since () used to execute the commands. But what you want to do, as far as i understand, is to use [] to reference at the element inside the array.
So, basically, you want to do v1[i] and v2[i] (instead of v1(i) and v2(i) respectively).
Related
I'm looking to run this code that enables to solve for the x number of unknowns (c_10, c_01, c_11 etc.) just from plotting the graph.
Some background on the equation:
Mooney-Rivlin model (1940) with P1 = c_10[(2*λ+λ**2)-3]+c_01[(λ**-2+2*λ)-3].
P1 (or known as P) and lambda are data pre-defined in numerical terms in the table below (sheet ExperimentData of experimental_data1.xlsx):
λ P
1.00 0.00
1.01 0.03
1.12 0.14
1.24 0.23
1.39 0.32
1.61 0.41
1.89 0.50
2.17 0.58
2.42 0.67
3.01 0.85
3.58 1.04
4.03 1.21
4.76 1.58
5.36 1.94
5.76 2.29
6.16 2.67
6.40 3.02
6.62 3.39
6.87 3.75
7.05 4.12
7.16 4.47
7.27 4.85
7.43 5.21
7.50 5.57
7.61 6.30
I have tried obtaining coefficients using Linear regression. However, to my knowledge, random forest is not able to obtain multiple coefficients using
reg.coef_
Tried SVR with
reg.dual_coef_
However keeps obtaining error
ValueError: not enough values to unpack (expected 2, got 1)
Code below:
data = pd.read_excel('experimental_data.xlsx', sheet_name='ExperimentData')
X_s = [[(2*λ+λ**2)-3, (λ**-2+2*λ)-3] for λ in data['λ']]
y_s = data['P']
svr = SVR()
svr.fit(X_s, y_s)
c_01, c_10 = svr.dual_coef_
And for future proofing this method, if lets say there are more than 2 coefficients, are there other methods apart from Linear Regression?
For example, referring to Ishihara model (1951) where
P1 = {2*c_10 + 4*c_20*c_01[(2*λ**-1+λ**2) - 3]*[(λ**-2 + 2*λ) - 3] + c_20 * c_01 * (λ**-1) * [(2*λ**-1 + λ**2) - 3]**2}*{λ - λ**-2}
Any comments is greatly appreciated!
I have a dataframe with messy data.
df:
1 2 3
-- ------- ------- -------
0 123/100 221/100 103/50
1 49/100 333/100 223/50
2 153/100 81/50 229/100
3 183/100 47/25 31/20
4 2.23 3.2 3.04
5 2.39 3.61 2.69
I want the fractional values to be converted to decimal with the conversion formula being
e.g:
123/100 = (123/100 + 1) = 2.23
333/100 = (333/100 +1) = 4.33
The calculation is fractional value + 1
And of course leave the decimal values as is.
How can I do it in Pandas and Python?
A simple way to do this is to first define a conversion function that will be applied to each element in a column:
def convert(s):
if '/' in s: # is a fraction
num, den = s.split('/')
return 1+(int(num)/int(den))
else:
return float(s)
Then use the .apply function to run all elements of a column through this function:
df['1'] = df['1'].apply(convert)
Result:
df['1']:
0 2.23
1 1.49
2 2.53
3 2.83
4 2.23
5 2.39
Then repeat on any other column as needed.
If you trust the data in your dataset, the simplest way is to use eval or better, suggested by #mozway, pd.eval:
>>> df.replace(r'(\d+)/(\d+)', r'1+\1/\2', regex=True).applymap(pd.eval)
1 2 3
0 2.23 3.21 3.06
1 1.49 4.33 5.46
2 2.53 2.62 3.29
3 2.83 2.88 2.55
4 2.23 3.20 3.04
5 2.39 3.61 2.69
I have a 2d matrix in Python like this (a 10 rows/20 columns list I use to later do an imshow):
[[-20.17 -12.88 -20.7 -25.69 -21.69 -34.22 -32.65 -31.74 -36.36 -37.65
-41.42 -41.14 -44.01 -43.19 -41.85 -39.25 -40.15 -41.31 -39.73 -28.66]
[ 14.18 53.86 70.03 64.39 72.37 39.95 30.44 28.14 20.77 17.98
25.74 25.66 27.56 37.61 42.39 42.39 35.79 41.65 41.65 41.84]
[ 33.71 68.35 69.39 66.7 59.99 40.08 40.08 40.8 26.19 19.82
19.82 18.07 20.32 19.51 24.77 22.81 21.45 21.45 21.45 23.7 ]
[103.72 55.11 32.3 29.47 16.53 15.54 9.4 8.11 5.06 5.06
13.07 13.07 12.99 13.47 13.47 13.47 12.92 12.92 14.27 20.63]
[ 59.02 18.6 37.53 24.5 13.01 34.35 8.16 13.66 12.57 8.11
8.11 8.11 8.11 8.11 8.11 5.66 5.66 5.66 5.66 7.41]
[ 52.69 14.17 7.25 -5.79 3.19 -1.75 -2.43 -3.98 -4.92 -6.68
-6.68 -6.98 -6.98 -8.89 -8.89 -9.15 -9.15 -9.15 -9.15 -9.15]
[ 29.24 10.78 0.6 -3.15 -12.55 3.04 -1.68 -1.68 -1.41 -6.15
-6.15 -6.15 -10.59 -10.59 -10.59 -10.59 -10.59 -9.62 -10.29 -10.29]
[ 6.6 0.11 2.42 0.21 -5.68 -10.84 -10.84 -13.6 -16.12 -14.41
-15.28 -15.28 -15.28 -18.3 -5.55 -13.16 -13.16 -13.16 -13.16 -14.15]
[ 3.67 -11.69 -6.99 -16.75 -19.31 -20.28 -21.5 -21.5 -34.02 -37.16
-25.51 -25.51 -26.36 -26.36 -26.36 -26.36 -29.38 -29.38 -29.59 -29.38]
[ 31.36 -2.87 0.34 -8.06 -12.14 -22.7 -24.39 -25.51 -26.36 -27.37
-29.38 -31.54 -31.54 -31.54 -32.41 -33.26 -33.26 -15.54 -15.54 -15.54]]
I'm trying to find a way to detect the "zone" of this matrix that contains the highest density of high values in it. It means it might not contain the highest single value of the whole list, obviously.
I suppose to do so I should define how big this zone is, so let's say it should be 2x2 (so I want to find what is the 'square' of 2x2 items containing the highest values).
I always think I have a logical solution to do so, but then I always fail following the logic of how it could work!
Anyone has a suggestion I could start from?
I know there might be some easier ways to do so, but this is the easiest for me. I've created the following function to perform this task which takes two arguments:
arr: a 2D numpy array.
zone_size: the size of the square zone.
And the function goes like so:
def get_heighest_zone(arr, zone_size):
max_sum = float("-inf")
row_idx, col_idx = 0, 0
for row in range(arr.shape[0]-zone_size):
for col in range(arr.shape[1]-zone_size):
curr_sum = np.sum(arr[row:row+zone_size, col:col+zone_size])
if curr_sum > max_sum:
row_idx, col_idx = row, col
max_sum = curr_sum
return arr[row_idx:row_idx+zone_size, col_idx:col_idx+zone_size]
Assuming arr is the numpy array posted in your question, applying this function over different zone_sizes will return these values:
>>> get_heighest_zone(arr, 2)
[[70.03 64.39]
[69.39 66.7 ]]
>>> get_heighest_zone(arr, 3)
[[53.86 70.03 64.39]
[68.35 69.39 66.7 ]
[55.11 32.3 29.47]]
>>> get_heighest_zone(arr, 4)
[[ 14.18 53.86 70.03 64.39]
[ 33.71 68.35 69.39 66.7 ]
[103.72 55.11 32.3 29.47]
[ 59.02 18.6 37.53 24.5 ]]
If the zone_size doesn't have to be square, then you will need to modify a little bit in the code. Also, you should assert that zone_size is less than the array size.
Hopefully, this is what you was looking for!
I have an Numpy array:
A = [ 1.56 1.47 1.31 1.16 1.11 1.14 1.06 1.12 1.19 1.06 0.92 0.78
0.6 0.59 0.4 0.03 0.11 0.54 1.17 1.9 2.6 3.28 3.8 4.28
4.71 4.61 4.6 4.41 3.88 3.46 3.04 2.63 2.3 1.75 1.24 1.14
0.97 0.92 0.94 1. 1.15 1.33 1.37 1.48 1.53 1.45 1.32 1.08
1.06 0.98 0.69]
How can I obtain the shannon entropy?
I have seen it like this but not sure:
print -np.sum(A * np.log2(A), axis=1)
There are essentially two cases and it is not clear from your sample which one applies here.
(1) Your probability distribution is discrete. Then you have to translate what appear to be relative frequencies to probabilities
pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(pA))
(2) Your probability distribution is continuous. In that case the values in your input needn't sum to one. Assuming that the input is sampled regularly from the entire space, you'd get
pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(A))
but in this case the formula really depends on the details of sampling and the underlying space.
Side note: the axis=1 in your example will cause an error since your input is flat. Omit it.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have data with one independent variable x and two dependent variables y1 and y2 as shown below:
x y1 y2
-1.5 16.25 1.02
-1.25 17 1.03
-1 15 1.03
-0.75 9 1.09
-0.5 5.9 1.15
-0.25 5.2 1.17
0 4.77 1.19
+0.25 3.14 1.35
+0.5 2.5 1.54
+0.75 2.21 1.69
+1 1.91 1.96
+1.25 1.64 2.27
+1.5 1.52 2.56
+1.75 1.37 3.06
+2 1.24 4.12
+2.25 1.2 4.44
+2.5 1.18 4.95
+2.75 1.12 6.49
+3 1.07 10
So, here the value of x where y1 = y2 is somewhere around +1. How do I read the data and calculate this in python?
The naive solution goes like this:
txt = """-1.5 16.25 1.02
-1.25 17 1.03
-1 15 1.03
-0.75 9 1.09
-0.5 5.9 1.15
-0.25 5.2 1.17
0 4.77 1.19
+0.25 3.14 1.35
+0.5 2.5 1.54
+0.75 2.21 1.69
+1 1.91 1.96
+1.25 1.64 2.27
+1.5 1.52 2.56
+1.75 1.37 3.06
+2 1.24 4.12
+2.25 1.2 4.44
+2.5 1.18 4.95
+2.75 1.12 6.49
+3 1.07 10"""
import numpy as np
# StringIO behaves like a file object, use it to simulate reading from a file
from StringIO import StringIO
x,y1,y2=np.transpose(np.loadtxt(StringIO(txt)))
p1 = np.poly1d(np.polyfit(x, y1, 1))
p2 = np.poly1d(np.polyfit(x, y2, 1))
print 'equations: ',p1,p2
#y1 and y2 have to be equal for some x, that you solve for :
# a x+ b = c x + d --> (a-c) x= d- b
a,b=list(p1)
c,d=list(p2)
x=(d-b)/(a-c)
print 'solution x= ',x
output:
equations:
-3.222 x + 7.323
1.409 x + 1.686
solution x= 1.21717324767
But then you plot the 'lines':
import matplotlib.pyplot as p
%matplotlib inline
p.plot(x,y1,'.-')
p.plot(x,y2,'.-')
And you realize you can't use a linear assumption but for a few segments.
x,y1,y2=np.transpose(np.loadtxt(StringIO(txt)))
x,y1,y2=x[8:13],y1[8:13],y2[8:13]
p1 = np.poly1d(np.polyfit(x, y1, 1))
p2 = np.poly1d(np.polyfit(x, y2, 1))
print 'equations: ',p1,p2
a,b=list(p1)
c,d=list(p2)
x0=(d-b)/(a-c)
print 'solution x= ',x0
p.plot(x,y1,'.-')
p.plot(x,y2,'.-')
Output:
equations:
-1.012 x + 2.968
1.048 x + 0.956
solution x= 0.976699029126
Even now one could improve by leaving two more points out (looking very linear, but that can be coincidental for a few points).
x,y1,y2=np.transpose(np.loadtxt(StringIO(txt)))
x1,x2=x[8:12],x[9:13]
y1,y2=y1[8:12],y2[9:13]
p1 = np.poly1d(np.polyfit(x1, y1, 1))
p2 = np.poly1d(np.polyfit(x2, y2, 1))
print 'equations: ',p1,p2
a,b=list(p1)
c,d=list(p2)
x0=(d-b)/(a-c)
print 'solution x= ',x0
import matplotlib.pyplot as p
%matplotlib inline
p.plot(x1,y1,'.-')
p.plot(x2,y2,'.-')
Output:
equations:
-1.152 x + 3.073
1.168 x + 0.806
solution x= 0.977155172414
Possibly better would be to use more points and apply a 2nd order interpolation np.poly1d(np.polyfit(x,y1,2)) and then solve the equality for two 2nd order polynomials, which I leave as an exercise (quadratic equation) for the reader.