Place ellipsis on seaborn catplot - python

I have a seaborn.catplot that looks like this:
What I am trying to do is highlight differences in the graph with the following rules:
If A-B > 4, color it green
If A-B < -1, color it red
If A-B = <2= and >=0, color it blue
I am looking to produce something akin to the below image:
I have an MRE here:
# Stack Overflow Example
import numpy as np, pandas as pd, seaborn as sns
from random import choice
from string import ascii_lowercase, digits
chars = ascii_lowercase + digits
lst = [''.join(choice(chars) for _ in range(2)) for _ in range(100)]
np.random.seed(8)
t = pd.DataFrame(
{
'Key': [''.join(choice(chars) for _ in range(2)) for _ in range(5)]*2,
'Value': np.random.uniform(low=1, high=10, size=(10,)),
'Type': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']
}
)
ax = sns.catplot(data=t, x='Value', y='Key', hue='Type', palette="dark").set(title="Stack Overflow Help Me")
plt.show()
I believe an ellipsis will need to be plotted around the points of interest, and I have looked into some questions:
Creating a Confidence Ellipses in a sccatterplot using matplotlib
plot ellipse in a seaborn scatter plot
But none seem to be doing this with catplot in particular, or with customizing their color and with rules.
How can I achieve the desired result with my toy example?

You could create ellipses around the midpoint of A and B, using the distance between A and B, increased by some padding, as width. The height should be a bit smaller than 1.
To get a full outline and transparent inner color, to_rgba() can be used. Setting the zorder to a low number puts the ellips behind the scatter points.
sns.scatterplot is an axes-level equivalent for sns.catplot, and is easier to work with when there is only one subplot.
Making the Key column of type pd.Categorical gives a fixed relation between y-position and label.
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
from matplotlib.colors import to_rgba
import seaborn as sns
import pandas as pd
import numpy as np
from string import ascii_lowercase, digits
chars = ascii_lowercase + digits
num = 9
df = pd.DataFrame({'Key': [''.join(np.random.choice([*chars], 2)) for _ in range(num)] * 2,
'Value': np.random.uniform(low=1, high=10, size=2 * num),
'Type': np.repeat(['A', 'B'], num)})
df['Key'] = pd.Categorical(df['Key']) # make the key categorical for a consistent ordering
sns.set_style('white')
ax = sns.scatterplot(data=df, x='Value', y='Key', hue='Type', palette="dark")
df_grouped = df.groupby(['Key', 'Type'])['Value'].mean().unstack()
for y_pos, y_label in enumerate(df['Key'].cat.categories):
A = df_grouped.loc[y_label, 'A']
B = df_grouped.loc[y_label, 'B']
dif = A - B
color = 'limegreen' if dif > 4 else 'crimson' if dif < -1 else 'dodgerblue' if 0 <= dif < 2 else None
if color is not None:
ell = Ellipse(xy=((A + B) / 2, y_pos), width=abs(dif) + 0.8, height=0.8,
fc=to_rgba(color, 0.1), lw=1, ec=color, zorder=0)
ax.add_patch(ell)
plt.tight_layout()
plt.show()

Related

Select the color of the bar in histogram plot based on its value

I have thousands of data that I want to plot the histogram of them. I want to put the different colors based on the values of the histogram. My values are between 0-10. So, I want to put the color of the bar from red to green. And if it is close to zero, the color should be red and if it is close to 10, the color should be green. Like the image I attached. In the following example, I want to set the color of row h as close to green, and the b is close to red. Here is a simple example, I have multiple bars and values.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
rating = [8, 4, 5,6]
objects = ('h', 'b', 'c','a')
y_pos = np.arange(len(objects))
plt.barh(y_pos, rating, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.show()
Could you please help me with this? Thank you.
To apply colors depending on values, matplotlib uses a colormap combined with a norm. The colormap maps values between 0 and 1 to a color, for example 0 to green, 0.5 to yellow and 1 to red. A norm maps values from a given range to the range 0 to 1, for example, the minimum value to 0 and the maximum value to 1. Applying the colormap to the norm of the given values then gives the desired colors.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
rating = [8, 4, 5, 6]
objects = ('h', 'b', 'c', 'a')
y_pos = np.arange(len(objects))
cmap = plt.get_cmap('RdYlGn_r')
norm = plt.Normalize(vmin=min(rating), vmax=max(rating))
plt.barh(y_pos, rating, align='center', color=cmap(norm(np.array(rating))))
plt.yticks(y_pos, objects)
plt.show()
Alternatively, the seaborn library could be used for a little bit simpler approach:
import seaborn as sns
rating = [8, 4, 5, 6]
objects = ['h', 'b', 'c', 'a']
ax = sns.barplot(x=rating, y=objects, hue=rating, palette='RdYlGn_r', dodge=False)

Side-by-side boxplots from two pandas in one figure

I have two pandas dataframes containing data for three different categories: 'a', 'b' and 'c'.
import pandas as pd
import numpy as np
n=100
df_a = pd.DataFrame({'id': np.ravel([['a' for i in range(n)], ['b' for i in range(n)], ['c' for i in range(n)]]),
'val': np.random.normal(0, 1, 3*n)})
df_b = pd.DataFrame({'id': np.ravel([['a' for i in range(n)], ['b' for i in range(n)], ['c' for i in range(n)]]),
'val': np.random.normal(1, 1, 3*n)})
I would like to illustrate the differences in 'a', 'b' and 'c' between the two dataframes, and for that I want to use boxplots. I.e., for each category ('a', 'b' and 'c'), I want to make side-by-side boxplots - and they should all be in the same figure.
So one figure containing 6 boxplots, 2 per category. How can I achieve this the easiest?
IIUC:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3, 2)
for j, df in enumerate([df_a, df_b]):
for i, cat in enumerate(sorted(df['id'].unique())):
df[df['id'] == cat].boxplot('val', 'id', ax=axes[i, j])
plt.tight_layout()
plt.show()
does this help? I tried to make it somewhat dynamic/ flexible
import matplotlib.pyplot as plt
import pandas
import seaborn as sns
ids = [val for val in df_a["id"].unique() for _ in (0, 1)]
fig, ax = plt.subplots(len(ids)//2,2, figsize=(10,10))
plt.subplots_adjust(hspace=0.5, wspace=0.3)
plt.suptitle("df_a vs. df_b")
ax = ax.ravel()
for i, id in enumerate(ids):
if i%2 == 0:
ax[i] = sns.boxplot(x=df_a[df_a.id == id]["val"], ax = ax[i])
else:
ax[i] = sns.boxplot(x=df_b[df_b.id == id]["val"], ax = ax[i])
ax[i].set_title(id)
sns.despine()
You could add an extra column to indicate the dataset and then concatenate the dataframes:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
n = 100
df_a = pd.DataFrame({'id': np.ravel([['a' for i in range(n)], ['b' for i in range(n)], ['c' for i in range(n)]]),
'val': np.random.normal(0, 1, 3 * n)})
df_b = pd.DataFrame({'id': np.ravel([['a' for i in range(n)], ['b' for i in range(n)], ['c' for i in range(n)]]),
'val': np.random.normal(1, 1, 3 * n)})
df_a['dataset'] = 'set a'
df_b['dataset'] = 'set b'
sns.boxplot(data=pd.concat([df_a, df_b]), x='id', y='val', hue='dataset', palette='spring')
plt.tight_layout()
plt.show()
PS: Note that in matplotlib (and seaborn, which builds upon it), a figure is a plot with one or more subplots (referred to as ax). As you write figure instead of plot, it might give the impression that you want multiple subplots. You can use sns.catplot(...., kind='box') to create multiple subplots from the concatenated dataframe.

Density clustering around a separate point - Python

I'm aiming to cluster xy points based on their proximity. Specifically, grouping points that are positioned closely to each other. I'm also hoping to use a separate reference point to cluster the data from.
Note: I have multiple sets of data that need to be clustered independently. For example using below, each unique value in Item signifies a different set of data. I could have multiple unique sets of data that all vary in sparsity. Therefore, any technique that passes a predetermined number of clusters isn't realistic as I'll have to manually check the fit and adjust the appropriate parameter every time.
As such, the best method thus far has been some form of density clustering (DBSCAN, OPTICS).
However, while I'm clustering points that are closely together, I'm hoping to pass some cut-off to keep the intended cluster spherical. On the other hand, I don't want to reduce the reachable area too much as I'm missing points that are close to the reference point and the core points but a small gap discards points that I'm hoping to include.
The following displays the dilemma below. Item 1 represents how the reachable should be lower to ensure the clustered points around the reference pint is spherical. While Item 2 shows how the reachable area needs to be higher to allow for points that are within the dense area to be included.
I'm hoping I can adjust a parameter or include a separate feature rather than force it. Because the dense area around the reference point can vary I'm reluctant to force every point outside a specific radius to be excluded.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
import seaborn as sns
from sklearn.cluster import OPTICS
fig, ax = plt.subplots(figsize = (6,6))
ax.grid(False)
df = pd.DataFrame({
'Item' : [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],
'x' : [-4.0,-1.0,0.5,0.0,0.0,2.0,3.0,5.0,10.0,-2.0,2.0,5.0,7.5,15.0,0.0,-22.0,-20.0,-20.0,-6.5,20.5,0.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0,-2.0,0.0,3.0,-3.0,-7.0,-7.5,-9.0,-4.0,1.5,-1.0,-5.0,-4.5,-3.7,15.0,-20.0,-22.0,-20.0,-20.0,-12.0,20.5,6.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
'y' : [0.0,1.0,-0.5,0.5,-0.5,0.0,1.0,0.0,0.0,-2.0,-2.0,-7.0,-0.5,-10.5,-7.5,0.0,16.0,-15.0,5.0,13.5,3.0,-20.0,2.0,-17.5,-15,19.0,20.0,4.0,-2.0,0.0,0.0,2.5,2.0,-1.5,5.0,0.0,3.5,2.0,-5.5,-6.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,6.0,-20.0,2.0,-17.5,-15,19.0,20.0],
'X_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0],
'Y_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0],
})
# not spherical
df = df[df['Item'] == 1]
# spherical but reachable area too small
#df = df[df['Item'] == 2]
df['distance'] = np.sqrt((df['X_Ref'] - df['x'])**2 + (df['Y_Ref'] - df['y'])**2)
Y_sklearn = df[['x','y']].values
ax.scatter(df['x'], df['y'], marker = 'o', s = 5)
ax.scatter(df['X_Ref'], df['Y_Ref'], c = 'w', edgecolor = 'k', marker = 'o', s = 7.5, zorder = 2)
#clusterer = DBSCAN(eps = 7.5, min_samples = 3)
#labels_clusters = clusterer.fit_predict(Y_sklearn)
clusterer = OPTICS(min_samples = 2, xi = 0.25, min_cluster_size = 0.25, max_eps = 5)
clusterer.fit(Y_sklearn)
labels_clusters = clusterer.fit_predict(Y_sklearn)
#Add cluster labels as a new column to original DataFrame.
df['cluster'] = labels_clusters
df['cluster'] = df['cluster'].astype('category')
sns.scatterplot(data = df,
x = 'x',
y = 'y',
hue = 'cluster',
ax = ax,
legend = 'full',
)
Item 1: points to the right of radius should be excluded from core points
Item 2: points within radius should be included in core points
I believe we could reformulate the problem. I am not sure the clustering approach is the best.
By clustering using distance
""""
https://stackoverflow.com/questions/66099958/density-clustering-around-a-separate-point-python
"""
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
import seaborn as sns
from sklearn.cluster import OPTICS
from sklearn.cluster import MiniBatchKMeans, KMeans
import matplotlib.pyplot as plt
# not spherical
df = pd.DataFrame({
'x' : [-4.0,-1.0,0.5,0.0,0.0,2.0,3.0,5.0,12.0,-2.0,2.0,8.0,8.5,15.0,-20.0,-22.0,-20.0,-20.0,-10.0,20.5,0.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
'y' : [0.0,1.0,-0.5,0.5,-0.5,0.0,1.0,0.0,0.0,-2.0,-2.0,-8.0,-0.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,3.0,-20.0,2.0,-17.5,-15,19.0,20.0],
'X_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
'Y_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
})
# spherical but reachable area too small
df1 = pd.DataFrame({
'x' : [-2.0,0.0,2.0,-3.0,-7.0,-7.5,-9.0,-4.0,1.5,-1.0,-5.0,-4.5,-3.7,15.0,-20.0,-22.0,-20.0,-20.0,-15.0,20.5,8.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
'y' : [4.0,-2.0,0.0,0.0,2.5,2.0,-2.0,5.0,0.0,3.5,2.0,-5.5,-6.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,5.0,-20.0,2.0,-17.5,-15,19.0,20.0],
'X_Ref' : [-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0],
'Y_Ref' : [-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0],
})
#Distance calculations
df['distance'] = np.sqrt((df['X_Ref'] - df['x'])**2 + (df['Y_Ref'] - df['y'])**2)
def distance_func(df):
return np.sqrt((df['X_Ref'] - df['x']) ** 2 + (df['Y_Ref'] - df['y']) ** 2)
df1['distance'] = distance_func(df1)
# Change this for the graphs
df = df1.copy()
Y_sklearn = df['distance'].values.reshape(-1, 1)
fig, ax = plt.subplots(figsize = (6,6))
ax.grid(False)
ax.scatter(df['x'], df['y'], marker = 'o', s = 5)
ax.scatter(df['X_Ref'], df['Y_Ref'], c = 'w', edgecolor = 'k', marker = 'o', s = 7.5, zorder = 2)
clusterer = KMeans(init='k-means++', n_clusters=2, n_init=10)
clusterer.fit(Y_sklearn)
labels_clusters = clusterer.fit_predict(Y_sklearn)
#Add cluster labels as a new column to original DataFrame.
df['cluster'] = labels_clusters
df['cluster'] = df['cluster'].astype('category')
sns.scatterplot(data = df,
x = 'x',
y = 'y',
hue = 'cluster',
ax = ax,
legend = 'full',
)
For df:
For df1:
By using marginal increase of area
As mentioned earlier I believe the problem could be reformulate using the idea of marginal area. Each point we add every time will increase the are considered in different ways.
In other words, use the elbow method for each point.
For area calculation I will just proxy be distance to the power of two.
Code:
""""
https://stackoverflow.com/questions/66099958/density-clustering-around-a-separate-point-python
"""
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
import seaborn as sns
from sklearn.cluster import OPTICS
from sklearn.cluster import MiniBatchKMeans, KMeans
import matplotlib.pyplot as plt
# not spherical
df = pd.DataFrame({
'x' : [-4.0,-1.0,0.5,0.0,0.0,2.0,3.0,5.0,12.0,-2.0,2.0,8.0,8.5,15.0,-20.0,-22.0,-20.0,-20.0,-10.0,20.5,0.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
'y' : [0.0,1.0,-0.5,0.5,-0.5,0.0,1.0,0.0,0.0,-2.0,-2.0,-8.0,-0.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,3.0,-20.0,2.0,-17.5,-15,19.0,20.0],
'X_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
'Y_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
})
# spherical but reachable area too small
df1 = pd.DataFrame({
'x' : [-2.0,0.0,2.0,-3.0,-7.0,-7.5,-9.0,-4.0,1.5,-1.0,-5.0,-4.5,-3.7,15.0,-20.0,-22.0,-20.0,-20.0,-15.0,20.5,8.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
'y' : [4.0,-2.0,0.0,0.0,2.5,2.0,-2.0,5.0,0.0,3.5,2.0,-5.5,-6.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,5.0,-20.0,2.0,-17.5,-15,19.0,20.0],
'X_Ref' : [-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0],
'Y_Ref' : [-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0],
})
df['distance'] = np.sqrt((df['X_Ref'] - df['x'])**2 + (df['Y_Ref'] - df['y'])**2)
def distance_func(df):
return np.sqrt((df['X_Ref'] - df['x']) ** 2 + (df['Y_Ref'] - df['y']) ** 2)
df1['distance'] = distance_func(df1)
# To shiwtch from one dataset to another.
#df=df1.copy()
df['distance_2'] = df['distance']**2
df.sort_values('distance',inplace=True)
#pd.DataFrame(df['marginal_change'].values).plot()
aux = pd.DataFrame(df['distance_2'].values, columns=['distance ** 2'])
aux.plot()
fig, ax = plt.subplots(figsize = (6,6))
ax.grid(False)
ax.scatter(df['x'], df['y'], marker = 'o', s = 5)
ax.scatter(df['X_Ref'], df['Y_Ref'], c = 'w', edgecolor = 'k', marker = 'o', s = 7.5, zorder = 2)
selected_top=10
labels_clusters = np.zeros(df.shape[0])
labels_clusters[0:selected_top] =1
#Add cluster labels as a new column to original DataFrame.
df['cluster'] = labels_clusters
df['cluster'] = df['cluster'].astype('category')
sns.scatterplot(data = df,
x = 'x',
y = 'y',
hue = 'cluster',
ax = ax,
legend = 'full',
)
For df:
Scree plot
From the scree plot you can see were the number of points is becoming too much. I will say the selection of 10 points could be good. The selection is based on the Elbow method.
Final plot:
For df1:
Scree plot:
Following Elbow method criteria 13 points could be the optimal.
Final plot:

Line plot with marker at final point

I am looking to produce a graph plotting the points of particles under the action of gravity and am currently producing a plot as below:
However, I would like to produce a clearer plot showing a line for the path of the particles and a marker at the final point indicating their final positions, like in the plot below:
My current line of code plotting each line is:
plt.plot(N_pos[:,0] * AU, N_pos[:,1], 'o')
This just plots the x and y coordinate from an array listing the x, y and z coordinate for each particle
Is the simplest way to do this remove the 'o' marker from the code and just plot the last position of each particle again but this time using a marker? If so, how to I make the line and final marker the same colour instead of like below?:
for i in range(len(all_positions[0])):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0] , N_pos[:,1])
plt.plot(N_pos[:,0][-1] , N_pos[:,1][-1], 'o')
When no explicit color is given, plt.plot() cycles through a list of default colors.
A simple solution would be to extract the color from the lineplot and provide it as the color for the dot:
import numpy as np
import matplotlib.pyplot as plt
a = np.random.randn(200, 10, 1).cumsum(axis=0) * 0.1
all_positions = np.dstack([np.sin(a), np.cos(a)]).cumsum(axis=0)
for i in range(len(all_positions[0])):
N_pos = all_positions[:, i]
line, = plt.plot(N_pos[:, 0], N_pos[:, 1])
plt.plot(N_pos[:, 0][-1], N_pos[:, 1][-1], 'o', color=line.get_color())
plt.show()
Another option would be to create a scatter plot, and set the size of the dots via an array. For example, N-1 times 1 and one time 20:
for i in range(len(all_positions[0])):
N_pos = all_positions[:, i]
plt.scatter(N_pos[:, 0], N_pos[:, 1], s=np.append(np.ones(len(N_pos) - 1), 20))
You can define your own color palette and give each trace its unique(ish) color:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
np.random.random(123)
all_positions = np.random.randn(10, 5, 2).cumsum(axis=0) #shamelessly stolen from JohanC
l = all_positions.shape[1]
my_cmap = cm.plasma
for i in range(l):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0], N_pos[:,1], c= my_cmap(i/l))
plt.plot(N_pos[:,0][-1], N_pos[:,1][-1], 'o', color=my_cmap(i/l))
plt.show()
Output:
You can reset the color cycler and plot the markers in a second round (not recommended, just to illustrate cycler properties):
import numpy as np
import matplotlib.pyplot as plt
np.random.random(123)
all_positions = np.random.randn(10, 5, 2).cumsum(axis=0)
l = all_positions.shape[1]
for i in range(l):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0], N_pos[:,1])
plt.gca().set_prop_cycle(None)
for i in range(l):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0][-1], N_pos[:,1][-1], 'o')
plt.show()
Sample output:

How to label these points on the scatter plot

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
plt.show()
Got the plot but not able to label these points
Provided you'd like to label each point, you can loop over each coordinate plotted, assigning it a label using plt.text() at the plotted point's position, like so:
from matplotlib import pyplot as plt
y_points = [i for i in range(0, 20)]
x_points = [(i*3) for i in y_points]
offset = 5
plt.figure()
plt.grid(True)
plt.scatter(x_points, y_points)
for i in range(0, len(x_points)):
plt.text(x_points[i] - offset, y_points[i], f'{x_points[i]}')
plt.show()
In the above example it will give the following:
The offset is just to make the labels more readable so that they're not right on top of the scattered points.
Obviously we don't have access to your spreadsheet, but the same basic concept would apply.
EDIT
For non numerical values, you can simply define the string as the coordinate. This can be done like so:
from matplotlib import pyplot as plt
y_strings = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
x_values = [i for i, string in enumerate(y_strings)]
# Plot coordinates:
plt.scatter(x_values, y_strings)
for i, string in enumerate(y_strings):
plt.text(x_values[i], string, f'{x_values[i]}:{string}')
plt.grid(True)
plt.show()
Which will provide the following output:

Categories

Resources