I have an Numpy array:
A = [ 1.56 1.47 1.31 1.16 1.11 1.14 1.06 1.12 1.19 1.06 0.92 0.78
0.6 0.59 0.4 0.03 0.11 0.54 1.17 1.9 2.6 3.28 3.8 4.28
4.71 4.61 4.6 4.41 3.88 3.46 3.04 2.63 2.3 1.75 1.24 1.14
0.97 0.92 0.94 1. 1.15 1.33 1.37 1.48 1.53 1.45 1.32 1.08
1.06 0.98 0.69]
How can I obtain the shannon entropy?
I have seen it like this but not sure:
print -np.sum(A * np.log2(A), axis=1)
There are essentially two cases and it is not clear from your sample which one applies here.
(1) Your probability distribution is discrete. Then you have to translate what appear to be relative frequencies to probabilities
pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(pA))
(2) Your probability distribution is continuous. In that case the values in your input needn't sum to one. Assuming that the input is sampled regularly from the entire space, you'd get
pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(A))
but in this case the formula really depends on the details of sampling and the underlying space.
Side note: the axis=1 in your example will cause an error since your input is flat. Omit it.
Related
I'm looking to run this code that enables to solve for the x number of unknowns (c_10, c_01, c_11 etc.) just from plotting the graph.
Some background on the equation:
Mooney-Rivlin model (1940) with P1 = c_10[(2*λ+λ**2)-3]+c_01[(λ**-2+2*λ)-3].
P1 (or known as P) and lambda are data pre-defined in numerical terms in the table below (sheet ExperimentData of experimental_data1.xlsx):
λ P
1.00 0.00
1.01 0.03
1.12 0.14
1.24 0.23
1.39 0.32
1.61 0.41
1.89 0.50
2.17 0.58
2.42 0.67
3.01 0.85
3.58 1.04
4.03 1.21
4.76 1.58
5.36 1.94
5.76 2.29
6.16 2.67
6.40 3.02
6.62 3.39
6.87 3.75
7.05 4.12
7.16 4.47
7.27 4.85
7.43 5.21
7.50 5.57
7.61 6.30
I have tried obtaining coefficients using Linear regression. However, to my knowledge, random forest is not able to obtain multiple coefficients using
reg.coef_
Tried SVR with
reg.dual_coef_
However keeps obtaining error
ValueError: not enough values to unpack (expected 2, got 1)
Code below:
data = pd.read_excel('experimental_data.xlsx', sheet_name='ExperimentData')
X_s = [[(2*λ+λ**2)-3, (λ**-2+2*λ)-3] for λ in data['λ']]
y_s = data['P']
svr = SVR()
svr.fit(X_s, y_s)
c_01, c_10 = svr.dual_coef_
And for future proofing this method, if lets say there are more than 2 coefficients, are there other methods apart from Linear Regression?
For example, referring to Ishihara model (1951) where
P1 = {2*c_10 + 4*c_20*c_01[(2*λ**-1+λ**2) - 3]*[(λ**-2 + 2*λ) - 3] + c_20 * c_01 * (λ**-1) * [(2*λ**-1 + λ**2) - 3]**2}*{λ - λ**-2}
Any comments is greatly appreciated!
X =
[[14.23 3.06 5.64 2.43]
[13.2 2.76 4.38 2.14]
[13.16 3.24 5.68 2.67]
[14.37 3.49 7.8 2.5 ]
[13.24 2.69 4.32 2.87]
[14.2 3.39 6.75 2.45]
[14.39 2.52 5.25 2.45]
[14.06 2.51 5.05 2.61]
[14.83 2.98 5.2 2.17]
[13.86 3.15 7.22 2.27]
[14.1 3.32 5.75 2.3 ]
[14.12 2.43 5. 2.32]
[13.75 2.76 5.6 2.41]
[14.75 3.69 5.4 2.39]
[14.38 3.64 7.5 2.38]
[13.63 2.91 7.3 2.7 ]
[14.3 3.14 6.2 2.72]
[13.83 3.4 6.6 2.62]
[14.19 3.93 8.7 2.48]
[13.64 3.03 5.1 2.56]]
Here is my dataset. Now I want to calculate the Euclidean distance for 2 of vectors (rows).
Row1 = X[1]
Row2 = X[2]
My function:
def Edistance (v1, v2):
distance = 0.0
for i in range(len(v1)-1):
distance += (v1(i)) - (v2(i))**2
return sqrt(distance)
Edistance(Row1,Row2)
I then get Typerror: NumPy array is not callable. Can I not use an array in my functions input?
You can pass any object as a function argument and so you can pass arrays, but as #xdurch0 mentioned earlier, your syntax is wrong.
def Edistance (v1: dict, v2: dict): # You
distance = 0.0
for i in range(len(v1)-1):
distance += (v1(i)) - (v2(i))**2
return sqrt(distance)
What you try to do here is to call v1 and v2 as if they were a functions, since () used to execute the commands. But what you want to do, as far as i understand, is to use [] to reference at the element inside the array.
So, basically, you want to do v1[i] and v2[i] (instead of v1(i) and v2(i) respectively).
I have a dataframe, height_df, with three measurements, 'height_1','height_2','height_3'. I want to create a new column that has the mean of all three heights. A printout of height_df is given below
height_1 height_2 height_3
0 1.78 1.80 1.80
1 1.70 1.70 1.69
2 1.74 1.75 1.73
3 1.66 1.68 1.67
The following code works but I don't understand why
height_df['height'] = height_df[['height_1','height_2','height_3']].mean(axis=1)
I actually want the mean across the row axes, i.e. for each row compute the average of the three heights. I would have thought then that the axis argument in mean should be set to 0, as this is what corresponds to applying the mean across rows, however axis=1 is what gets the result I am looking for. Why is this? If axis=1 is for columns and axis=0 is for rows then why does .mean(axis=1) take the mean across rows?
Just need to tell mean to work across columns with axis=1
df = pd.DataFrame({"height_1":[1.78,1.7,1.74,1.66],"height_2":[1.8,1.7,1.75,1.68],"height_3":[1.8,1.69,1.73,1.67]})
df = df.assign(height_mean=df.mean(axis=1))
df = df.assign(height_mean=df.loc[:,['height_1','height_2','height_3']].mean(axis=1))
print(df.to_string(index=False))
output
height_1 height_2 height_3 height_mean
1.78 1.80 1.80 1.793333
1.70 1.70 1.69 1.696667
1.74 1.75 1.73 1.740000
1.66 1.68 1.67 1.670000
I have two rgb images of same size, and I would like to compute a similarity metric. I thought of starting out with euclidean distance:
import scipy.spatial.distance as dist
import cv2
im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")
>> im1.shape
(820, 740, 3)
>> dist.euclidean(im1,im2)
ValueError: Input vector should be 1-D.
I know that dist.euclidean expects a 1-D array and im1 and im2 are 3-D, but is there a function that will work with 3-D arrays, or is it possible to transform im1 and im2 into a 1-D array that preserves the information in the images?
Grayscale Solution (?)
(There is a discussion below as to your comment about a function "that preserves the information in the images")
It seems possible to me that you might be able to solve the problem using a grayscale image rather than an RGB image. I know I'm making assumptions here, but it's a thought.
I'm going to try a simple example relating to your code, then give an example of an image similarity measure using 2D Discrete Fourier Transforms that uses a conversion to grayscale. That DFT analysis will have its own section
(My apologies if you see this while in progress. I'm just trying to make sure my work is saved.)
Because of my assumption, I'm going to try your method with some RGB images, then see if the problem will be solved by converting to grayscale. If the problem is solved with grayscale, we can do an analysis of the amount of information loss brought on by the grayscale solution by finding the image similarity using a combination of all three channels, each compared separately.
Method
Making sure I have all the libraries/packages/whatever you want to call them.
> python -m pip install opencv-python
> python -m pip install scipy
> python -m pip install numpy
Note that, in this trial, I'm using some PNG images that were created in the attempt (described below) to use a 2D DFT.
Making sure I get the same problem
>>> import scipy.spatial.distance as dist
>>> import cv2
>>>
>>> im1 = cv2.imread("rhino1_clean.png")
>>> im2 = cv2.imread("rhino1_streak.png")
>>>
>>> im1.shape
(178, 284, 3)
>>>
>>> dist.euclidean(im1, im2)
## Some traceback stuff ##
ValueError: Input vector should be 1-D.
Now, let's try using grayscale. If this works, we can simply find the distance for each of the RGB Channels. I hope it works, because I want to do the information-loss analysis.
Let's convert to grayscale:
>>> im1_gray = cv2.cvtColor(im1, cv2.COLOR_BGR2GRAY)
>>> im2_gray = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)
>>> im1_gray.shape
(178, 284)
A simple dist.euclidean(im1_gray, im2,gray) will lead to the same ValueError: Input vector should be 1-D. exception, but I know the structure of a grayscale image array (an array of pixel rows), so I do the following.
>>> dists = []
>>> for i in range(0, len(im1_gray)):
... dists.append(dist.euclidean(im1_gray[i], im2_gray[i]))
...
>>> sum_dists = sum(dists)
>>> ave_dist = sum_dists/len(dists)
>>> ave_dist
2185.9891304058297
By the way, here are the two original images:
Grayscale worked (with massaging), let's try color
Following some procedure from this SO answer, let's do the following.
Preservation of Information
Following the analysis here (archived), let's look at our information loss. (Note that this will be a very naïve analysis, but I want to give a crack at it.
Grayscale vs. Color Information
Let's just look at the color vs. the grayscale. Later, we can look at whether we preserve the information about the distances.
Comparisons of different distance measures using grayscale vs. all three channels - comparison using ratio of distance sums for a set of images.
I don't know how to do entropy measurements for the distances, but my intuition tells me that, if I calculate distances using grayscale and using color channels, I should come up with similar ratios of distances IF I haven't lost any information.
My first thought when seeing this question was to use a 2-D Discrete Fourier Transform, which I'm sure is available in Python or NumPy or OpenCV. Basically, your first components of the DFT will relate to large shapes in your image. (Here is where I'll put in a relevant research paper: link. I didn't look too closely - anyone is welcome to suggest another.)
So, let me look up a 2-D DFT easily available from Python, and I'll get back to putting up some working code.
(My apologies if you see this while in progress. I'm just trying to make sure my work is saved.)
First, you'll need to make sure you have PIL Pillow and NumPy. It seems you have NumPy, but here are some instructions. (Note that I'm on Windows at the moment) ...
> python -m pip install opencv-python
> python -m pip install numpy
> python -m pip install pillow
Now, here are 5 images -
a rhino image, rhino1_clean.jpg (source);
the same image with some black streaks drawn on by me in MS Paint, rhino1_streak.jpg;
another rhino image, rhino2_clean.jpg (source);
a first hippo image hippo1_clean.jpg (source);
a second hippo image, hippo2_clean.jpg (source).
All images used with fair use.
Okay, now, to illustrate further, let's go to the Python interactive terminal.
>python
>>> import PIL
>>> import numpy as np
First of all, life will be easier if we use grayscale PNG images - PNG because it's a straight bitmap (rather than a compressed image), grayscale because I don't have to show all the details with the channels.
>>> rh_img_1_cln = PIL.Image.open("rhino1_clean.jpg")
>>> rh_img_1_cln.save("rhino1_clean.png")
>>> rh_img_1_cln_gs = PIL.Image.open("rhino1_clean.png").convert('LA')
>>> rh_img_1_cln_gs.save("rhino1_clean_gs.png")
Follow similar steps for the other four images. I used PIL variable names, rh_img_1_stk, rh_img_2_cln, hp_img_1_cln, hp_img_2_cln. I ended up with the following image filenames for the grayscale images, which I'll use further: rhino1_streak_gs.png, rhino2_clean_gs.png, hippo1_clean_gs.png, hippo2_clean_gs.png.
Now, let's get the coefficients for the DFTs. The following code (ref. this SO answer) would be used for the first, clean rhino image.
Let's "look" at the image array, first. This will show us a grid version of the top-left column, with higher values being more white and lower values being more black.
Note that, before I begin outputting this array, I set things to the numpy default, cf. https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html
>>> np.set_printoptions(edgeitems=3,infstr='inf',
... linewidth=75, nanstr='nan', precision=8,
... suppress=False, threshold=1000, formatter=None)
>>> rh1_cln_gs_array = np.array(rh_img_1_cln_gs)
>>> for i in {0,1,2,3,4}:
... print(rh1_cln_gs_array[i][:13])
...
[93 89 78 87 68 74 58 51 73 96 90 75 86]
[85 93 64 64 76 49 19 52 65 76 86 81 76]
[107 87 71 62 54 31 32 49 51 55 81 87 69]
[112 93 94 72 57 45 58 48 39 49 76 86 76]
[ 87 103 90 65 88 61 44 57 34 55 70 80 92]
Now, let's run the DFT and look at the results. I change my numpy print options to make things nicer before I start the actual transform.
>>> np.set_printoptions(formatter={'all':lambda x: '{0:.2f}'.format(x)})
>>>
>>> rh1_cln_gs_fft = np.fft.fft2(rh_img_1_cln_gs)
>>> rh1_cln_gs_scaled_fft = 255.0 * rh1_cln_gs_fft / rh1_cln_gs_fft.max()
>>> rh1_cln_gs_real_fft = np.absolute(rh1_cln_gs_scaled_fft)
>>> for i in {0,1,2,3,4}:
... print(rh1_cln_gs_real_fft[i][:13])
...
[255.00 1.46 7.55 4.23 4.53 0.67 2.14 2.30 1.68 0.77 1.14 0.28 0.19]
[38.85 5.33 3.07 1.20 0.71 5.85 2.44 3.04 1.18 1.68 1.69 0.88 1.30]
[29.63 3.95 1.89 1.41 3.65 2.97 1.46 2.92 1.91 3.03 0.88 0.23 0.86]
[21.28 2.17 2.27 3.43 2.49 2.21 1.90 2.33 0.65 2.15 0.72 0.62 1.13]
[18.36 2.91 1.98 1.19 1.20 0.54 0.68 0.71 1.25 1.48 1.04 1.58 1.01]
Now, the result for following the same procedure with rhino1_streak.jpg
[255.00 3.14 7.69 4.72 4.34 0.68 2.22 2.24 1.84 0.88 1.14 0.55 0.25]
[40.39 4.69 3.17 1.52 0.77 6.15 2.83 3.00 1.40 1.57 1.80 0.99 1.26]
[30.15 3.91 1.75 0.91 3.90 2.99 1.39 2.63 1.80 3.14 0.77 0.33 0.78]
[21.61 2.33 2.64 2.86 2.64 2.34 2.25 1.87 0.91 2.21 0.59 0.75 1.17]
[18.65 3.34 1.72 1.76 1.44 0.91 1.00 0.56 1.52 1.60 1.05 1.74 0.66]
I'll print \Delta values instead of doing a more comprehensive distance. You could sum the squares of the values shown here, if you want a distance.
>>> for i in {0,1,2,3,4}:
... print(rh1_cln_gs_real_fft[i][:13] - rh1_stk_gs_real_fft[i][:13])
...
[0.00 -1.68 -0.15 -0.49 0.19 -0.01 -0.08 0.06 -0.16 -0.11 -0.01 -0.27
-0.06]
[-1.54 0.64 -0.11 -0.32 -0.06 -0.30 -0.39 0.05 -0.22 0.11 -0.11 -0.11 0.04]
[-0.53 0.04 0.14 0.50 -0.24 -0.02 0.07 0.30 0.12 -0.11 0.11 -0.10 0.08]
[-0.33 -0.16 -0.37 0.57 -0.15 -0.14 -0.36 0.46 -0.26 -0.07 0.13 -0.14
-0.04]
[-0.29 -0.43 0.26 -0.58 -0.24 -0.37 -0.32 0.15 -0.27 -0.12 -0.01 -0.17
0.35]
I'll be putting just three coefficient arrays truncated to a length of five to show how this works for showing image similarity. Honestly, this is an experiment for me, so we'll see how it goes.
You can work on comparing those coefficients with distances or other metrics.
More About Preservation of Information
Let's do an information-theoretical analysis of information loss with the methods proposed above.
Following the analysis here (archived), let's look at our information loss.
Good luck!
You can try
import scipy.spatial.distance as dist
import cv2
import numpy as np
im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")
dist.euclidean(im1.flatten(), im2.flatten())
You can use the reshape function for both images to convert them from 3D to 1D.
import scipy.spatial.distance as dist
import cv2
im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")
im1.reshape(1820400)
im2.reshape(1820400)
dist.euclidean(im1,im2)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have data with one independent variable x and two dependent variables y1 and y2 as shown below:
x y1 y2
-1.5 16.25 1.02
-1.25 17 1.03
-1 15 1.03
-0.75 9 1.09
-0.5 5.9 1.15
-0.25 5.2 1.17
0 4.77 1.19
+0.25 3.14 1.35
+0.5 2.5 1.54
+0.75 2.21 1.69
+1 1.91 1.96
+1.25 1.64 2.27
+1.5 1.52 2.56
+1.75 1.37 3.06
+2 1.24 4.12
+2.25 1.2 4.44
+2.5 1.18 4.95
+2.75 1.12 6.49
+3 1.07 10
So, here the value of x where y1 = y2 is somewhere around +1. How do I read the data and calculate this in python?
The naive solution goes like this:
txt = """-1.5 16.25 1.02
-1.25 17 1.03
-1 15 1.03
-0.75 9 1.09
-0.5 5.9 1.15
-0.25 5.2 1.17
0 4.77 1.19
+0.25 3.14 1.35
+0.5 2.5 1.54
+0.75 2.21 1.69
+1 1.91 1.96
+1.25 1.64 2.27
+1.5 1.52 2.56
+1.75 1.37 3.06
+2 1.24 4.12
+2.25 1.2 4.44
+2.5 1.18 4.95
+2.75 1.12 6.49
+3 1.07 10"""
import numpy as np
# StringIO behaves like a file object, use it to simulate reading from a file
from StringIO import StringIO
x,y1,y2=np.transpose(np.loadtxt(StringIO(txt)))
p1 = np.poly1d(np.polyfit(x, y1, 1))
p2 = np.poly1d(np.polyfit(x, y2, 1))
print 'equations: ',p1,p2
#y1 and y2 have to be equal for some x, that you solve for :
# a x+ b = c x + d --> (a-c) x= d- b
a,b=list(p1)
c,d=list(p2)
x=(d-b)/(a-c)
print 'solution x= ',x
output:
equations:
-3.222 x + 7.323
1.409 x + 1.686
solution x= 1.21717324767
But then you plot the 'lines':
import matplotlib.pyplot as p
%matplotlib inline
p.plot(x,y1,'.-')
p.plot(x,y2,'.-')
And you realize you can't use a linear assumption but for a few segments.
x,y1,y2=np.transpose(np.loadtxt(StringIO(txt)))
x,y1,y2=x[8:13],y1[8:13],y2[8:13]
p1 = np.poly1d(np.polyfit(x, y1, 1))
p2 = np.poly1d(np.polyfit(x, y2, 1))
print 'equations: ',p1,p2
a,b=list(p1)
c,d=list(p2)
x0=(d-b)/(a-c)
print 'solution x= ',x0
p.plot(x,y1,'.-')
p.plot(x,y2,'.-')
Output:
equations:
-1.012 x + 2.968
1.048 x + 0.956
solution x= 0.976699029126
Even now one could improve by leaving two more points out (looking very linear, but that can be coincidental for a few points).
x,y1,y2=np.transpose(np.loadtxt(StringIO(txt)))
x1,x2=x[8:12],x[9:13]
y1,y2=y1[8:12],y2[9:13]
p1 = np.poly1d(np.polyfit(x1, y1, 1))
p2 = np.poly1d(np.polyfit(x2, y2, 1))
print 'equations: ',p1,p2
a,b=list(p1)
c,d=list(p2)
x0=(d-b)/(a-c)
print 'solution x= ',x0
import matplotlib.pyplot as p
%matplotlib inline
p.plot(x1,y1,'.-')
p.plot(x2,y2,'.-')
Output:
equations:
-1.152 x + 3.073
1.168 x + 0.806
solution x= 0.977155172414
Possibly better would be to use more points and apply a 2nd order interpolation np.poly1d(np.polyfit(x,y1,2)) and then solve the equality for two 2nd order polynomials, which I leave as an exercise (quadratic equation) for the reader.