python Contour plot - python

I have a file which looks like this:
1237665126927237227 7.49126127875 1500 7.0
1237665126927237227 6.64062342139 1750 7.0
1237665126927237227 5.79903397289 2000 7.0
1237665126927237227 7.24807646775 1500 7.5
1237665126927237227 6.51250095795 1750 7.5
1237665126927237227 5.74908888515 2000 7.5
1237665126927237227 6.91915170741 1500 8.0
1237665126927237227 6.29638684709 1750 8.0
1237665126927237227 5.62891381033 2000 8.0
1237665126927237227 6.54437390102 1500 8.5
1237665126927237227 5.98359412299 1750 8.5
1237665126927237227 5.43512459898 2000 8.5
etc
I need to create a plot with the 3rd column as the x axis and 4th column as the y axis, with the 2nd column as a contour on it, with contour lines at 1,2,3,4 and so on.
Im am trying to do something along the lines of,
from pylab import *
ChiTable= np.loadtxt('ChiTableSingle.txt')
xlist = linspace(ChiTable[2])
ylist = linspace(ChiTable[3])
X, Y = meshgrid (xlist, ylist)
Z =partsChi[1]
figure()
CP1 = contour(X, Y, Z)
clabel(CP1, inline=True, fontsize=10)
pl.show()
but im just getting myself totally confused by it all. Im getting an error saying z input needs to be a 2d array, which i can understrnd why as ive made X,Y into a 2d array, and z needs to be values matching up to this, but ive got no idea how id go about that.

You need to reshape your data, not use meshgrid.
Something like:
xdim = 3
ydim = 3
X = np.asarray(ChiTable[2]).reshape((xdim, ydim))
Y = np.asarray(ChiTable[3]).reshape((xdim, ydim))
Z = np.asarray(ChiTable[1]).reshape((xdim, ydim))
contour(X, Y, Z)
meshgrid takes in two 1-D arrays and gives you back a cross of them, reshape changes an array with N total number of elements into an array with the same number of elements, but shaped differently.

Related

How to create a sine curve of positive part only between two integer values

I have to generate a sine curve of the positive part only between two values. The idea is my variable say monthly-averaged RH, which has 12 data points in a year (i.e. time series) varies between 50 and 70 in a sinusoidal way. The first and the last data points end at 50.
Can anyone help how I can generate this curve/function for the curve to get values of all intermediate data points? I am trying to use numpy/scipy for this.
Best,
Debayan
This is basic trig.
import math
for i in range(12):
print( i, 50 + 20 * math.sin( math.pi * i / 12 ) )
Output:
0 50.0
1 55.17638090205041
2 60.0
3 64.14213562373095
4 67.32050807568876
5 69.31851652578136
6 70.0
7 69.31851652578136
8 67.32050807568878
9 64.14213562373095
10 60.0
11 55.17638090205042

Arrange Pandas DataFrame to be used for scipy interpn

I have to perform an interpolation in 3 or 4 dimensions moving from a tabular data stored as Pandas DataFrame.
I have the following data stored in the variable df : DataFrame:
xm xA xl z
2.3 4.6 10.0 1.905
2.3 4.6 11.0 1.907
2.3 4.8 10.0 1.908
2.3 4.8 11.0 1.909
2.4 4.6 10.0 1.811
2.4 4.6 11.0 1.812
2.4 4.8 10.0 1.813
2.4 4.8 11.0 1.814
xm, xa, xl are the axis from which the grid should be drawn. The column z contains the values from which the interpolation is to be performed. Indeed, the regular grid I came up with is calculated as:
grid = np.meshgrid(*(df.xm,df.xA,df.xl))
Now my problem is how to turn the Z-series data from the DataFrame into a np.array to be passed to the Scipy function:
from scipy import interpolate
p0 = (xm0,xA0,xl0)
z0 = interpolate.interpn(grid, myarray, p0)
Thanks to SCKU for the hint on the z-column reshape. I was using
grid = np.meshgrid(*(df.xm,df.xA,df.xl))
following the example from scipy doc.
It was actually enough to pass the tuple of base axis array:
grid = np.meshgrid(xm,xA,xLn)
z = df.z.values.reshape(grid[0].shape)
xt = (df.xM,df.xA,df.xLn)
p0 = (xM0,xA0,xLn0)
val = interpolate.interpn(xt, z, p0)

3D interpolation in Python Pandas using a mesh grid

I have the following 10 lines of a large Pandas dataframe df;
X,Y,Z are grid points in xyz-direction; U,V,W are (measured) velocity components in x,y,z-direction.
X Y Z U V W
0 -201.0 -2.00 11.200 3.750 -15.20 -0.75800
1 -201.0 -2.00 12.220 3.640 -15.40 -0.71100
2 -200.0 -3.00 1.079 -1.480 -3.86 0.03670
3 -198.0 -3.00 7.190 4.220 -13.50 -1.31000
4 -198.0 -1.43 5.530 3.510 -10.10 -1.56000
5 -195.0 -1.43 6.140 3.900 -11.80 -1.50000
6 -195.0 -2.54 0.000 -0.767 -5.19 0.00154
7 -195.0 -3.54 0.600 -1.210 -6.04 -0.05580
8 -191.0 -5.54 1.449 -1.510 -2.80 -0.20900
9 -191.0 -7.54 2.392 -0.782 -2.65 -0.56000
I want to now interpolate the values U,V,W over a finer 5x5x5 grid in X,Y,Z.
x = np.arange(-200, -175, 5)
y = np.arange(-10, 5, 5)
z = np.arange(0,20,5)
xx, yy, zz = np.meshgrid(x, y,z )
NT = np.product(xx.shape)
data_grid = {
"x_grid": np.reshape(xx,NT),
"y_grid": np.reshape(yy,NT),
"z_grid": np.reshape(zz,NT)
}
df2 = pd.DataFrame(data= data_grid)
I see scipy has this interpolate griddata function which I am trying to call (for now I only interpolate U in XYZ).
xp = df['X'].to_numpy()
yp = df['Y'].to_numpy()
zp = df['Z'].to_numpy()
up = df['U'].to_numpy()
U_grid = griddata([(xp,yp,zp)], up, [(x_grid,y_grid,z_grid)], method='nearest')
But this gives me:
"ValueError: different number of values and points"
What do I do wrong?

How to plot a separator line between two data classes?

I have a simple exercise that I am not sure how to do. I have the following data sets:
male100
Year Time
0 1896 12.00
1 1900 11.00
2 1904 11.00
3 1906 11.20
4 1908 10.80
5 1912 10.80
6 1920 10.80
7 1924 10.60
8 1928 10.80
9 1932 10.30
10 1936 10.30
11 1948 10.30
12 1952 10.40
13 1956 10.50
14 1960 10.20
15 1964 10.00
16 1968 9.95
17 1972 10.14
18 1976 10.06
19 1980 10.25
20 1984 9.99
21 1988 9.92
22 1992 9.96
23 1996 9.84
24 2000 9.87
25 2004 9.85
26 2008 9.69
and the second one:
female100
Year Time
0 1928 12.20
1 1932 11.90
2 1936 11.50
3 1948 11.90
4 1952 11.50
5 1956 11.50
6 1960 11.00
7 1964 11.40
8 1968 11.00
9 1972 11.07
10 1976 11.08
11 1980 11.06
12 1984 10.97
13 1988 10.54
14 1992 10.82
15 1996 10.94
16 2000 11.12
17 2004 10.93
18 2008 10.78
I have the following code:
y = -0.014*male100['Year']+38
plt.plot(male100['Year'],y,'r-',color = 'b')
ax = plt.gca() # gca stands for 'get current axis'
ax = male100.plot(x=0,y=1, kind ='scatter', color='g', label="Mens 100m", ax = ax)
female100.plot(x=0,y=1, kind ='scatter', color='r', label="Womens 100m", ax = ax)
Which produces this result:
I need to plot a line that would go exactly between them. So the line would leave all of the green points below it, and the red point above it. How do I do so?
I've tried playing with the parameters of y, but to no avail. I also tried fitting a linear regression to male100 , female100 , and the merged version of them (across rows), but couldn't get any results.
Any help would be appreciated!
A solution is using support vector machine (SVM). You can find two margins that separate two classes of points. Then, the average line of two support vectors is your answer. Notice that it's happened just when these two set of points are linearly separable.
You can use the following code to see the result:
Data Entry
male = [
(1896 , 12.00),
(1900 , 11.00),
(1904 , 11.00),
(1906 , 11.20),
(1908 , 10.80),
(1912 , 10.80),
(1920 , 10.80),
(1924 , 10.60),
(1928 , 10.80),
(1932 , 10.30),
(1936 , 10.30),
(1948 , 10.30),
(1952 , 10.40),
(1956 , 10.50),
(1960 , 10.20),
(1964 , 10.00),
(1968 , 9.95),
(1972 , 10.14),
(1976 , 10.06),
(1980 , 10.25),
(1984 , 9.99),
(1988 , 9.92),
(1992 , 9.96),
(1996 , 9.84),
(2000 , 9.87),
(2004 , 9.85),
(2008 , 9.69)
]
female = [
(1928, 12.20),
(1932, 11.90),
(1936, 11.50),
(1948, 11.90),
(1952, 11.50),
(1956, 11.50),
(1960, 11.00),
(1964, 11.40),
(1968, 11.00),
(1972, 11.07),
(1976, 11.08),
(1980, 11.06),
(1984, 10.97),
(1988, 10.54),
(1992, 10.82),
(1996, 10.94),
(2000, 11.12),
(2004, 10.93),
(2008, 10.78)
]
Main Code
Notice that the value of C is important here. If it is selected to 1, you can't get the preferred result.
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
X = np.array(male + female)
Y = np.array([0] * len(male) + [1] * len(female))
# fit the model
clf = svm.SVC(kernel='linear', C=1000) # C is important here
clf.fit(X, Y)
plt.figure(figsize=(8, 4))
# get the separating hyperplane
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-1000, 10000)
yy = a * xx - (clf.intercept_[0]) / w[1]
plt.figure(1, figsize=(4, 3))
plt.clf()
plt.plot(xx, yy, "k-") #********* This is the separator line ************
plt.scatter(X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.cm.Paired,
edgecolors="k")
plt.xlim((1890, 2010))
plt.ylim((9, 13))
plt.show()
I believe your idea of making use of regression lines is correct - if they aren't used, the line would be merely superficial (and impossible to justify if the points overlap in the event of messy data).
Therefore, using some randomly made data with a known linear relationship, we can do the following:
import random
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
x_values = np.arange(0, 51, 1)
y_points_1 = [i * 2 + random.randint(5, 30) for i in x_points]
y_points_2 = [i - random.randint(5, 30) for i in x_points]
x_points = x_values.reshape(-1, 1)
def regression(x, y):
model = LinearRegression().fit(x, y)
y_pred = model.predict(x)
return y_pred
barrier = [(regression(x=x_points, y=y_points_1)[i] + value) / 2 for i, value in enumerate(regression(x=x_points, y=y_points_2))]
plt.plot(x_points, regression(x=x_points, y=y_points_1))
plt.plot(x_points, regression(x=x_points, y=y_points_2))
plt.plot(x_points, barrier)
plt.scatter(x_values, y_points_1)
plt.scatter(x_values, y_points_2)
plt.grid(True)
plt.show()
Giving us the following plot:
This method also works for an overlap in the data points, so if we change the random data slightly and apply the same process:
x_values = np.arange(0, 51, 1)
y_points_1 = [i * 2 + random.randint(-10, 30) for i in x_points]
y_points_2 = [i - random.randint(-10, 30) for i in x_points]
We get something like the following:
It is important to note that the lists used here are of the same length, so you would need to add some predicted points to the female data after applying regression in order to make use of the line between them. These points would merely be along the regression line with the x-values corresponding to those present in the male data.
Because sklearn might be a bit over the top for a linear fit and to get rid of the condition that you would need the same number of data points for male and female data, here the same implementation with numpy.polyfit. This also demonstrates that their approach is not a solution to the problem.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
#data import
male = pd.read_csv("test1.txt", delim_whitespace=True)
female = pd.read_csv("test2.txt", delim_whitespace=True)
#linear fit of both populations
pmale = np.polyfit(male.Year, male.Time, 1)
pfemale = np.polyfit(female.Year, female.Time, 1)
#more appealing presentation, let's pretend we do not just fit a line
x_fitmin=min(male.Year.min(), female.Year.min())
x_fitmax=max(male.Year.max(), female.Year.max())
x_fit=np.linspace(x_fitmin, x_fitmax, 100)
#create functions for the three fit lines
male_fit = np.poly1d(pmale)
print(male_fit)
female_fit = np.poly1d(pfemale)
print(female_fit)
sep = np.poly1d(np.mean([pmale, pfemale], axis=0))
print(sep)
#plot all markers and lines
ax = male.plot(x="Year", y="Time", c="blue", kind="scatter", label="male")
female.plot(x="Year", y="Time", c="red", kind="scatter", ax=ax, label="female")
ax.plot(x_fit, male_fit(x_fit), c="blue", ls="dotted", label="male fit")
ax.plot(x_fit, female_fit(x_fit), c="red", ls="dotted", label="female fit")
ax.plot(x_fit, sep(x_fit), c="black", ls="dashed", label="separator")
plt.legend()
plt.show()
Sample output:
-0.01333 x + 36.42
-0.01507 x + 40.92
-0.0142 x + 38.67
And one point is still in the wrong section. However - I find this question so interesting because I expected answers from the sklearn crowd for non-linear data groups. I even installed sklearn in anticipation! If in the next days nobody posts a good solution
with SVMs, I will set a bounty on this question.
One solution is the geometrical approach. You can find the convex hull of each data class, then find a line that goes through these two convex hulls. To find the line, you can find inner tangent line between two convex hulls using this code, and rotate it a little bit.
You can use the following code:
from scipy.spatial import ConvexHull, convex_hull_plot_2d
male = np.array(male)
female = np.array(female)
hull_male = ConvexHull(male)
hull_female = ConvexHull(female)
plt.plot(male[:,0], male[:,1], 'o')
for simplex in hull_male.simplices:
plt.plot(male[simplex, 0], male[simplex, 1], 'k-')
# Here, the separator line comes from SMV‌ result.
# Just to show the a separator as an exmple
# plt.plot(xx, yy, "k-")
plt.plot(female[:,0], female[:,1], 'o')
for simplex in hull_female.simplices:
plt.plot(female[simplex, 0], female[simplex, 1], 'k-')
plt.xlim((1890, 2010))
plt.ylim((9, 13))

How to check which points are inside a circle?

I have a dataframe df that contains the distances between all the points (IDs) in my system. So the df looks like the following:
df
radius ID1 ID2 x1 y1 x2 y2
0 0.454244 100 103 103.668919 1.335309 103.671812 1.332424
1 1.016734 100 123 103.668919 1.335309 103.677598 1.332424
2 0.643200 103 123 103.671812 1.332424 103.677598 1.332424
3 1.605608 100 124 103.668919 1.335309 103.677598 1.346851
4 1.728349 103 124 103.671812 1.332424 103.677598 1.346851
I want to compute the circle between all the points and then check witch points are inside that circle. For each points I have the coordinates in a separated dataframe coordinates.
coordinates
ID x y
0 100 103.668919 1.335309
1 103 103.671812 1.332424
2 124 103.677598 1.346851
3 125 103.677598 1.349737
4 134 103.680491 1.341080
5 135 103.680491 1.343966
6 136 103.680491 1.346851
7 137 103.680491 1.349737
8 138 103.680491 1.352622
9 146 103.683384 1.341080
Here the code
from matplotlib.patches import Circle
for i in df.index:
x = df.x1[i]
y = df.y1[i]
circ = Circle((x, y), radius = df.radius)
## it works until here: from now I need to understand what to do
## and in particular I need to find which points are inside the circle
points = circ.contains_point([coordinates.x, coordinates.y])
which returns the error
ValueError: setting an array element with a sequence.
When I have issues like this, I always do a small sanity test:
from matplotlib.patches import Circle
circ = Circle((0, 0), radius = 1)
print(circ.contains_point([0.5,0.5]))
print(circ.contains_point([2,2]))
I get (as expected)
True
False
So coordinates.x and coordinates.y are probably arrays, which explains the message.
contains_points works on a tuple or list of 2 scalars.
To generate your list, you could do a loop within a list comprehension:
points = [(x,y) for x,y in zip(coordinates.x, coordinates.y) if circ.contains_point(x,y)]

Categories

Resources